Lunar Lander Example is Seriously Regressed

Description

I just found out that the original version of the Lunar Lander example was able to land successfully sometimes. In the current code, it never even gets remotely close. It can't even get a positive score.

The original code says:

# This is a work in progress, and currently takes ~100 generations to
# find a network that can land with a score >= 200 at least a couple of
# times.  It has yet to solve the environment

In the current code, I can run it for 500+ generations without it ever cresting above 0 reward. So something has seriously regressed. In reading the code, I now realize that the compute_fitness function makes no sense to me, so I believe there is some issue in confusing rewards with outputs. Also, the actual scores obtained when running the networks afterward are nowhere near the "fitness" being plotted. So this also points to there being a complete disconnect between "fitness" and actual score.

I will be debugging this in the next couple of days, but wanted to report the issue ahead of time.

To Reproduce

Steps to reproduce the behavior:

cd examples/openai-lander
python evolve.py
See a fitness.svg plot like the one below. We can't achieve a positive reward (solving the task would be a reward of +200).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Lunar Lander Example is Seriously Regressed #256

Description

To Reproduce

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Lunar Lander Example is Seriously Regressed #256

Description

Description

To Reproduce

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions