Skip to content

Commit 8e6d572

Browse files
committed
re-organized the lectures
1 parent 8375baf commit 8e6d572

File tree

1 file changed

+49
-11
lines changed

1 file changed

+49
-11
lines changed

README.md

Lines changed: 49 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33

44

55

6-
1. **Fundamentals.**
6+
1. **Overview.**
77

88

99
* Reinforcement Learning
@@ -26,46 +26,84 @@
2626
* AlphaGo
2727
[[slides](https://github.com/wangshusen/DRL/blob/master/Slides/1_Basics_5.pdf)]
2828
[[Video (in Chinese)](https://youtu.be/zHojAp5vkRE)].
29+
30+
31+
2932

3033

34+
2. **TD Learning.**
35+
36+
* Sarsa.
37+
[[slides](https://github.com/wangshusen/DRL/blob/master/Slides/2_TD_1.pdf)]
38+
39+
* Q-learning.
40+
[[slides](https://github.com/wangshusen/DRL/blob/master/Slides/2_TD_2.pdf)]
41+
42+
* Multi-Step TD Target.
43+
[[slides](https://github.com/wangshusen/DRL/blob/master/Slides/2_TD_3.pdf)]
44+
45+
46+
47+
3148

32-
2. **Advanced Topics on Value-Based Learning.**
49+
3. **Advanced Topics on Value-Based Learning.**
3350

3451

3552
* Experience Replay (ER) & Prioritized ER.
36-
[[slides](https://github.com/wangshusen/DRL/blob/master/Slides/2_DQN_1.pdf)]
53+
[[slides](https://github.com/wangshusen/DRL/blob/master/Slides/3_DQN_1.pdf)]
54+
[[Video (in Chinese)]()].
3755

3856
* Overestimation, Target Network, & Double DQN.
39-
[[slides](https://github.com/wangshusen/DRL/blob/master/Slides/2_DQN_2.pdf)]
40-
41-
* TD Learning Recap & Multi-Step Return.
42-
[[slides](https://github.com/wangshusen/DRL/blob/master/Slides/2_DQN_3.pdf)]
57+
[[slides](https://github.com/wangshusen/DRL/blob/master/Slides/3_DQN_2.pdf)]
58+
[[Video (in Chinese)]()].
4359

4460
* Dueling Networks.
61+
[[slides](https://github.com/wangshusen/DRL/blob/master/Slides/3_DQN_3.pdf)]
62+
[[Video (in Chinese)]()].
63+
64+
4565

4666

67+
4. **Policy Gradient with Baseline.**
4768

48-
3. **Advanced Topics on Policy-Based Learning.**
4969

70+
* Policy Gradient with Baseline.
71+
[[slides](https://github.com/wangshusen/DRL/blob/master/Slides/4_Policy_1.pdf)]
72+
73+
* REINFORCE with Baseline.
74+
[[slides](https://github.com/wangshusen/DRL/blob/master/Slides/4_Policy_2.pdf)]
5075

5176
* Advantage Actor-Critic (A2C).
77+
[[slides](https://github.com/wangshusen/DRL/blob/master/Slides/4_Policy_3.pdf)]
78+
79+
* REINFORCE versus A2C.
80+
[[slides](https://github.com/wangshusen/DRL/blob/master/Slides/4_Policy_4.pdf)]
81+
82+
83+
84+
5. **Advanced Topics on Policy-Based Learning.**
5285

5386
* Trust-Region Policy Optimization (TRPO).
5487

5588
* Policy Network + RNNs.
5689

5790

5891

59-
4. **Dealing with Continuous Action Space.**
92+
6. **Dealing with Continuous Action Space.**
93+
6094

95+
* Discrete versus Continuous Control.
96+
[[slides](https://github.com/wangshusen/DRL/blob/master/Slides/6_Continuous_1.pdf)]
6197

6298
* Deterministic Policy Gradient (DPG) for Continuous Control.
99+
[[slides](https://github.com/wangshusen/DRL/blob/master/Slides/6_Continuous_2.pdf)]
63100

64101
* Stochastic Policy Gradient for Continuous Control.
102+
[[slides](https://github.com/wangshusen/DRL/blob/master/Slides/6_Continuous_3.pdf)]
65103

66104

67105

68-
5. **Multi-Agent Reinforcement Learning.**
106+
7. **Multi-Agent Reinforcement Learning.**
69107

70108

71109
* Basics and Challenges
@@ -78,7 +116,7 @@
78116

79117

80118

81-
6. **Imitation Learning.**
119+
8. **Imitation Learning.**
82120

83121

84122
* Inverse Reinforcement Learning.

0 commit comments

Comments
 (0)