Skip to content
This repository was archived by the owner on Aug 15, 2019. It is now read-only.

Save optimizers #1714

Merged
merged 41 commits into from
May 3, 2019
Merged

Save optimizers #1714

merged 41 commits into from
May 3, 2019

Conversation

caisq
Copy link
Collaborator

@caisq caisq commented Apr 29, 2019

FEATURE

  • Change the classNames of the optimizers align with TensorFlow (Python)
  • Use arrays to explicitly order the optimizers' weights to facilitate weight loading and saving
  • Implement getWeights() for weight saving
  • Implement setWeights() for weight loading

Towards tensorflow/tfjs#83


This change is Reviewable

@caisq caisq requested review from nsthorat and dsmilkov May 1, 2019 14:06
Copy link
Collaborator Author

@caisq caisq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 approvals obtained (waiting on @dsmilkov and @nsthorat)


src/optimizers/optimizer.ts, line 25 at r2 (raw file):

import {NamedTensor, NamedTensorMap} from '../tensor_types';

export interface VariableWithOriginalName {

@nsthorat @dsmilkov

Note for reviewers: We talked about using the proper name of the tf.Variable objects. However, I found that to not work, the reason being more than one optimizer can be created for the same set of Variable, which would lead to a clash in variable names. But during optimization serialization, we don't want the deduplicating suffixes to show up in the names.

Copy link
Contributor

@dsmilkov dsmilkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 approvals obtained (waiting on @caisq, @dsmilkov, and @nsthorat)


src/gradients.ts, line 262 at r3 (raw file):

 * @returns An object with the following keys and values:
 *   - `value`: The value of the function `f`.
 *   - `grads`: The a map from the names of the variables to the gradients.

typo: The map


src/gradients.ts, line 264 at r3 (raw file):

 *   - `grads`: The a map from the names of the variables to the gradients.
 *     If the `varList` argument is provided explicitly and contains a subset of
 *     untrainable variables, this map in the return value will contain keys

s/untrainable/non-trainable/ for consistency


src/io/types.ts, line 93 at r3 (raw file):

   * Type of the weight.
   *
   * Optinoal.

typo: optional


src/io/types.ts, line 96 at r3 (raw file):

   *
   * The value 'optimizer' indicates the weight belongs to an optimizer
   * (i.e., not the proper part of a model).

instead of "not the proper part of a model", how about "not used at inference time"


src/io/types.ts, line 124 at r3 (raw file):

   * Whether the optimizer will be saved (if exists).
   *
   * Deafult: `false`.

typo: default


src/optimizers/adadelta_optimizer.ts, line 112 at r3 (raw file):

  setWeights(weightValues: NamedTensor[]): void {
    weightValues = super.setIterations(weightValues);

What happens if someone implements a custom optimizer and forgets to call setIterations()? Would be great to avoid forcing all optimizers to call setIterations(). Can you call setIterations() in the deserialization code?


src/optimizers/optimizer.ts, line 25 at r2 (raw file):

Previously, caisq (Shanqing Cai) wrote…

@nsthorat @dsmilkov

Note for reviewers: We talked about using the proper name of the tf.Variable objects. However, I found that to not work, the reason being more than one optimizer can be created for the same set of Variable, which would lead to a clash in variable names. But during optimization serialization, we don't want the deduplicating suffixes to show up in the names.

How about calling it OptimizerVariable?


src/optimizers/optimizer.ts, line 70 at r3 (raw file):

  }

  get iterations(): Variable {

add jsdoc since it's public api.


src/optimizers/sgd_optimizer_test.ts, line 71 at r3 (raw file):

    weights = optimizer1.getWeights();
    // No iterations prior to applyGradients() call.

is this comment relevant here?


src/optimizers/sgd_optimizer_test.ts, line 78 at r3 (raw file):

    const optimizer2 = tf.train.sgd(learningRate);
    optimizer2.setWeights(weights);
    expectArraysClose(await optimizer2.iterations.data(), 1);

can you test also getsWeights() after calling setWeights

Copy link
Contributor

@dsmilkov dsmilkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 approvals obtained (waiting on @caisq, @dsmilkov, and @nsthorat)


src/optimizers/adagrad_optimizer.ts, line 80 at r3 (raw file):

  dispose(): void {
    super.dispose();

(Not for this PR) A general comment on the problems with inheritance. Any implementer now has to remember to dispose the parent. It is easy to make a custom optimizer, forget to do this, and cause a mem leak. Ideally instead of inheritance, we would have an OptimizationDriver (that has the shared code), and it will contain a private member of an optimizer. that way dispose to the parent will always dispose the child.

Copy link
Collaborator Author

@caisq caisq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review!

Reviewable status: 0 of 1 approvals obtained (waiting on @dsmilkov and @nsthorat)


src/gradients.ts, line 262 at r3 (raw file):

Previously, dsmilkov (Daniel Smilkov) wrote…

typo: The map

Done. Changed to 'a map'.


src/gradients.ts, line 264 at r3 (raw file):

Previously, dsmilkov (Daniel Smilkov) wrote…

s/untrainable/non-trainable/ for consistency

Done. Corrected one other place in types.ts as well.


src/io/types.ts, line 93 at r3 (raw file):

Previously, dsmilkov (Daniel Smilkov) wrote…

typo: optional

Done.


src/io/types.ts, line 96 at r3 (raw file):

Previously, dsmilkov (Daniel Smilkov) wrote…

instead of "not the proper part of a model", how about "not used at inference time"

Done.


src/io/types.ts, line 124 at r3 (raw file):

Previously, dsmilkov (Daniel Smilkov) wrote…

typo: default

Done.


src/optimizers/adadelta_optimizer.ts, line 112 at r3 (raw file):

Previously, dsmilkov (Daniel Smilkov) wrote…

What happens if someone implements a custom optimizer and forgets to call setIterations()? Would be great to avoid forcing all optimizers to call setIterations(). Can you call setIterations() in the deserialization code?

See my reply below.


src/optimizers/adagrad_optimizer.ts, line 80 at r3 (raw file):

Previously, dsmilkov (Daniel Smilkov) wrote…

(Not for this PR) A general comment on the problems with inheritance. Any implementer now has to remember to dispose the parent. It is easy to make a custom optimizer, forget to do this, and cause a mem leak. Ideally instead of inheritance, we would have an OptimizationDriver (that has the shared code), and it will contain a private member of an optimizer. that way dispose to the parent will always dispose the child.

I made the following change which hopefully addresses this concern: I changed iterations from a variable to a number. It leads to the following improvements:

  1. No need to call super.dispose() in subclasss dispose() anymore. Prevents the risk of memory leak.
  2. It saves (a small number) of tensor operations during each applyGradient() or minimize() call.

For authors of custom optimizer do not have to worry about iterations unless they care about it. They can save the weights of their custom optimizer without the iterations "weight" in the front, so there is no need to call super.getWeights() during saving and super.setIterations() during weigths loading. If they care about it, they can follow the pattern in these pre-made optimizers. So at this point, there is no requirement to call super methods in custom optimizers.


src/optimizers/optimizer.ts, line 25 at r2 (raw file):

Previously, dsmilkov (Daniel Smilkov) wrote…

How about calling it OptimizerVariable?

Done. That makes sense. I also added a doc string here to clarify why this interface is needed on top of Variable.


src/optimizers/optimizer.ts, line 70 at r3 (raw file):

Previously, dsmilkov (Daniel Smilkov) wrote…

add jsdoc since it's public api.

Done.


src/optimizers/sgd_optimizer_test.ts, line 71 at r3 (raw file):

Previously, dsmilkov (Daniel Smilkov) wrote…

is this comment relevant here?

Added missing assertions.


src/optimizers/sgd_optimizer_test.ts, line 78 at r3 (raw file):

Previously, dsmilkov (Daniel Smilkov) wrote…

can you test also getsWeights() after calling setWeights

Added lines to test that, for both sgd and rmsprop.

Copy link
Contributor

@dsmilkov dsmilkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 22 files at r2, 14 of 17 files at r4.
Reviewable status: 0 of 1 approvals obtained (waiting on @caisq and @nsthorat)


src/io/io_utils_test.ts, line 231 at r5 (raw file):

      {name: 'b41', tensor: tensor1d([-1.3, -3.7, 1.3, 3.7])}
    ];
    tf.io.encodeWeights(tensors)

use await instead of .then() (easier to read). Here and below.


src/optimizers/adagrad_optimizer.ts, line 80 at r3 (raw file):

Previously, caisq (Shanqing Cai) wrote…

I made the following change which hopefully addresses this concern: I changed iterations from a variable to a number. It leads to the following improvements:

  1. No need to call super.dispose() in subclasss dispose() anymore. Prevents the risk of memory leak.
  2. It saves (a small number) of tensor operations during each applyGradient() or minimize() call.

For authors of custom optimizer do not have to worry about iterations unless they care about it. They can save the weights of their custom optimizer without the iterations "weight" in the front, so there is no need to call super.getWeights() during saving and super.setIterations() during weigths loading. If they care about it, they can follow the pattern in these pre-made optimizers. So at this point, there is no requirement to call super methods in custom optimizers.

Thank you for improving this!


src/optimizers/optimizer.ts, line 154 at r4 (raw file):

   */
  protected setIterations(weightValues: NamedTensor[]): NamedTensor[] {
    this.iterations_ = weightValues[0].tensor.dataSync()[0];

Hmmm... New backends (e.g. wasm and webgpu) don't support dataSync(). Is it possible to move the logic of taking the first element of the weights futher up the stack, when serialization/deserialization happens?


src/optimizers/optimizer.ts, line 123 at r5 (raw file):

   */
  dispose(): void {
    if (this.iterations_ != null) {

this is a number now, so no need for disposal

Copy link
Contributor

@dsmilkov dsmilkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 approvals obtained (waiting on @caisq and @nsthorat)


src/optimizers/optimizer.ts, line 154 at r4 (raw file):

Previously, dsmilkov (Daniel Smilkov) wrote…

Hmmm... New backends (e.g. wasm and webgpu) don't support dataSync(). Is it possible to move the logic of taking the first element of the weights futher up the stack, when serialization/deserialization happens?

Chatted offline. setWeights and getWeights will become async


src/optimizers/rmsprop_optimizer.ts, line 152 at r5 (raw file):

      variables.push(...this.accumulatedMeanGrads);
    }
    return super.getWeights().concat(

let's have this be [this.getIterationScalar()].concat(myWeights) which parallels this.incrementIterations()

Copy link
Collaborator Author

@caisq caisq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 approvals obtained (waiting on @caisq and @nsthorat)


src/optimizers/optimizer.ts, line 154 at r4 (raw file):

Previously, dsmilkov (Daniel Smilkov) wrote…

Chatted offline. setWeights and getWeights will become async

Done. setWeights() and getWeights() are both offline.
So are the renamed saveIterations() and extractIterations() in the base class.

Now, none of the concrete Optimizer classes need to call super methods. They all call methods such as this.saveIterations() and this.extractIterations().

Also as discussed offline, I slightly revised AdamOptimizer in order to not lose the accBeta1 and accBeta2 state after saving and loading.


src/optimizers/rmsprop_optimizer.ts, line 152 at r5 (raw file):

Previously, dsmilkov (Daniel Smilkov) wrote…

let's have this be [this.getIterationScalar()].concat(myWeights) which parallels this.incrementIterations()

Done. I named the method saveIterations().

@caisq
Copy link
Collaborator Author

caisq commented May 3, 2019

s/are both offline/are both async/

Copy link
Contributor

@dsmilkov dsmilkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the discussions yesterday! - I did some more searching about the motivation for iterations -- seems like all tf.keras optimizers support rate_decay (including sgd) which use iterations (this allows us to later update our optimizers). Really nice work!

Reviewed 2 of 8 files at r5, 14 of 14 files at r6.
Reviewable status: :shipit: complete! 1 of 1 approvals obtained (waiting on @caisq and @nsthorat)


src/optimizers/rmsprop_optimizer.ts, line 152 at r5 (raw file):

Previously, caisq (Shanqing Cai) wrote…

Done. I named the method saveIterations().

SGTM.

@caisq caisq merged commit 28fd404 into tensorflow:master May 3, 2019
@caisq caisq deleted the save-optimizers branch May 3, 2019 15:05
caisq added a commit to tensorflow/tfjs-layers that referenced this pull request Jul 1, 2019
FEATURE

- Currently defaulting `includeOptimizers` to `false`, which is completely backward compatible. We can decide on whether we'll change it to `true` for certain environments later.
- Allow loss names to be python-style snake case.

Towards: tensorflow/tfjs#83

Depends on tensorflow/tfjs-core#1714
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants