Description
Description
I just found out that the original version of the Lunar Lander example was able to land successfully sometimes. In the current code, it never even gets remotely close. It can't even get a positive score.
The original code says:
# This is a work in progress, and currently takes ~100 generations to # find a network that can land with a score >= 200 at least a couple of # times. It has yet to solve the environment
In the current code, I can run it for 500+ generations without it ever cresting above 0 reward. So something has seriously regressed. In reading the code, I now realize that the compute_fitness
function makes no sense to me, so I believe there is some issue in confusing rewards with outputs. Also, the actual scores obtained when running the networks afterward are nowhere near the "fitness" being plotted. So this also points to there being a complete disconnect between "fitness" and actual score.
I will be debugging this in the next couple of days, but wanted to report the issue ahead of time.
To Reproduce
Steps to reproduce the behavior:
cd examples/openai-lander
python evolve.py
- See a
fitness.svg
plot like the one below. We can't achieve a positive reward (solving the task would be a reward of +200).