Skip to content

Lunar Lander Example is Seriously Regressed #256

Open
@ntraft

Description

@ntraft

Description

I just found out that the original version of the Lunar Lander example was able to land successfully sometimes. In the current code, it never even gets remotely close. It can't even get a positive score.

The original code says:

# This is a work in progress, and currently takes ~100 generations to
# find a network that can land with a score >= 200 at least a couple of
# times.  It has yet to solve the environment

In the current code, I can run it for 500+ generations without it ever cresting above 0 reward. So something has seriously regressed. In reading the code, I now realize that the compute_fitness function makes no sense to me, so I believe there is some issue in confusing rewards with outputs. Also, the actual scores obtained when running the networks afterward are nowhere near the "fitness" being plotted. So this also points to there being a complete disconnect between "fitness" and actual score.

I will be debugging this in the next couple of days, but wanted to report the issue ahead of time.

To Reproduce

Steps to reproduce the behavior:

  1. cd examples/openai-lander
  2. python evolve.py
  3. See a fitness.svg plot like the one below. We can't achieve a positive reward (solving the task would be a reward of +200).

fitness

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions