Skip to content

Commit 2048750

Browse files
committed
edit
1 parent c0bc82e commit 2048750

File tree

2 files changed

+2
-177
lines changed

2 files changed

+2
-177
lines changed

Reinforcement_learning_TUT/7_Policy_gradient_softmax/RL_brain.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -73,6 +73,8 @@ def _build_net(self):
7373
neg_log_prob = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=all_act, labels=self.tf_acts) # this is negative log of chosen action
7474
# or in this way:
7575
# neg_log_prob = tf.reduce_sum(-tf.log(self.all_act_prob)*tf.one_hot(self.tf_acts, self.n_actions), axis=1)
76+
77+
# to maximize total reward (log_p * R) is to minimize -(log_p * R), and the tf only have minimize(loss)
7678
loss = tf.reduce_mean(neg_log_prob * self.tf_vt) # reward guided loss
7779

7880
with tf.name_scope('train'):

Reinforcement_learning_TUT/8_Actor_Critic_Advantage/eligibility.py

Lines changed: 0 additions & 177 deletions
This file was deleted.

0 commit comments

Comments
 (0)