|
3 | 3 |
|
4 | 4 |
|
5 | 5 |
|
6 |
| -1. **Fundamentals.** |
| 6 | +1. **Overview.** |
7 | 7 |
|
8 | 8 |
|
9 | 9 | * Reinforcement Learning
|
|
26 | 26 | * AlphaGo
|
27 | 27 | [[slides](https://github.com/wangshusen/DRL/blob/master/Slides/1_Basics_5.pdf)]
|
28 | 28 | [[Video (in Chinese)](https://youtu.be/zHojAp5vkRE)].
|
| 29 | + |
| 30 | + |
| 31 | + |
29 | 32 |
|
30 | 33 |
|
| 34 | +2. **TD Learning.** |
| 35 | + |
| 36 | + * Sarsa. |
| 37 | + [[slides](https://github.com/wangshusen/DRL/blob/master/Slides/2_TD_1.pdf)] |
| 38 | + |
| 39 | + * Q-learning. |
| 40 | + [[slides](https://github.com/wangshusen/DRL/blob/master/Slides/2_TD_2.pdf)] |
| 41 | + |
| 42 | + * Multi-Step TD Target. |
| 43 | + [[slides](https://github.com/wangshusen/DRL/blob/master/Slides/2_TD_3.pdf)] |
| 44 | + |
| 45 | + |
| 46 | + |
| 47 | + |
31 | 48 |
|
32 |
| -2. **Advanced Topics on Value-Based Learning.** |
| 49 | +3. **Advanced Topics on Value-Based Learning.** |
33 | 50 |
|
34 | 51 |
|
35 | 52 | * Experience Replay (ER) & Prioritized ER.
|
36 |
| - [[slides](https://github.com/wangshusen/DRL/blob/master/Slides/2_DQN_1.pdf)] |
| 53 | + [[slides](https://github.com/wangshusen/DRL/blob/master/Slides/3_DQN_1.pdf)] |
| 54 | + [[Video (in Chinese)]()]. |
37 | 55 |
|
38 | 56 | * Overestimation, Target Network, & Double DQN.
|
39 |
| - [[slides](https://github.com/wangshusen/DRL/blob/master/Slides/2_DQN_2.pdf)] |
40 |
| - |
41 |
| - * TD Learning Recap & Multi-Step Return. |
42 |
| - [[slides](https://github.com/wangshusen/DRL/blob/master/Slides/2_DQN_3.pdf)] |
| 57 | + [[slides](https://github.com/wangshusen/DRL/blob/master/Slides/3_DQN_2.pdf)] |
| 58 | + [[Video (in Chinese)]()]. |
43 | 59 |
|
44 | 60 | * Dueling Networks.
|
| 61 | + [[slides](https://github.com/wangshusen/DRL/blob/master/Slides/3_DQN_3.pdf)] |
| 62 | + [[Video (in Chinese)]()]. |
| 63 | + |
| 64 | + |
45 | 65 |
|
46 | 66 |
|
| 67 | +4. **Policy Gradient with Baseline.** |
47 | 68 |
|
48 |
| -3. **Advanced Topics on Policy-Based Learning.** |
49 | 69 |
|
| 70 | + * Policy Gradient with Baseline. |
| 71 | + [[slides](https://github.com/wangshusen/DRL/blob/master/Slides/4_Policy_1.pdf)] |
| 72 | + |
| 73 | + * REINFORCE with Baseline. |
| 74 | + [[slides](https://github.com/wangshusen/DRL/blob/master/Slides/4_Policy_2.pdf)] |
50 | 75 |
|
51 | 76 | * Advantage Actor-Critic (A2C).
|
| 77 | + [[slides](https://github.com/wangshusen/DRL/blob/master/Slides/4_Policy_3.pdf)] |
| 78 | + |
| 79 | + * REINFORCE versus A2C. |
| 80 | + [[slides](https://github.com/wangshusen/DRL/blob/master/Slides/4_Policy_4.pdf)] |
| 81 | + |
| 82 | + |
| 83 | + |
| 84 | +5. **Advanced Topics on Policy-Based Learning.** |
52 | 85 |
|
53 | 86 | * Trust-Region Policy Optimization (TRPO).
|
54 | 87 |
|
55 | 88 | * Policy Network + RNNs.
|
56 | 89 |
|
57 | 90 |
|
58 | 91 |
|
59 |
| -4. **Dealing with Continuous Action Space.** |
| 92 | +6. **Dealing with Continuous Action Space.** |
| 93 | + |
60 | 94 |
|
| 95 | + * Discrete versus Continuous Control. |
| 96 | + [[slides](https://github.com/wangshusen/DRL/blob/master/Slides/6_Continuous_1.pdf)] |
61 | 97 |
|
62 | 98 | * Deterministic Policy Gradient (DPG) for Continuous Control.
|
| 99 | + [[slides](https://github.com/wangshusen/DRL/blob/master/Slides/6_Continuous_2.pdf)] |
63 | 100 |
|
64 | 101 | * Stochastic Policy Gradient for Continuous Control.
|
| 102 | + [[slides](https://github.com/wangshusen/DRL/blob/master/Slides/6_Continuous_3.pdf)] |
65 | 103 |
|
66 | 104 |
|
67 | 105 |
|
68 |
| -5. **Multi-Agent Reinforcement Learning.** |
| 106 | +7. **Multi-Agent Reinforcement Learning.** |
69 | 107 |
|
70 | 108 |
|
71 | 109 | * Basics and Challenges
|
|
78 | 116 |
|
79 | 117 |
|
80 | 118 |
|
81 |
| -6. **Imitation Learning.** |
| 119 | +8. **Imitation Learning.** |
82 | 120 |
|
83 | 121 |
|
84 | 122 | * Inverse Reinforcement Learning.
|
|
0 commit comments