Save optimizers #1714

caisq · 2019-04-29T13:06:36Z

FEATURE

Change the classNames of the optimizers align with TensorFlow (Python)
Use arrays to explicitly order the optimizers' weights to facilitate weight loading and saving
Implement getWeights() for weight saving
Implement setWeights() for weight loading

This change is

- variableGrads(): If a list of variables is specified, the non-trainable ones should lead to explicit null entries in the return value.

caisq

Reviewable status: 0 of 1 approvals obtained (waiting on @dsmilkov and @nsthorat)

src/optimizers/optimizer.ts, line 25 at r2 (raw file):

import {NamedTensor, NamedTensorMap} from '../tensor_types';

export interface VariableWithOriginalName {

@nsthorat @dsmilkov

Note for reviewers: We talked about using the proper name of the tf.Variable objects. However, I found that to not work, the reason being more than one optimizer can be created for the same set of Variable, which would lead to a clash in variable names. But during optimization serialization, we don't want the deduplicating suffixes to show up in the names.

dsmilkov

Reviewable status: 0 of 1 approvals obtained (waiting on @caisq, @dsmilkov, and @nsthorat)

src/gradients.ts, line 262 at r3 (raw file):

 * @returns An object with the following keys and values:
 *   - `value`: The value of the function `f`.
 *   - `grads`: The a map from the names of the variables to the gradients.

typo: The map

src/gradients.ts, line 264 at r3 (raw file):

 *   - `grads`: The a map from the names of the variables to the gradients.
 *     If the `varList` argument is provided explicitly and contains a subset of
 *     untrainable variables, this map in the return value will contain keys

s/untrainable/non-trainable/ for consistency

src/io/types.ts, line 93 at r3 (raw file):

   * Type of the weight.
   *
   * Optinoal.

typo: optional

src/io/types.ts, line 96 at r3 (raw file):

   *
   * The value 'optimizer' indicates the weight belongs to an optimizer
   * (i.e., not the proper part of a model).

instead of "not the proper part of a model", how about "not used at inference time"

src/io/types.ts, line 124 at r3 (raw file):

   * Whether the optimizer will be saved (if exists).
   *
   * Deafult: `false`.

typo: default

src/optimizers/adadelta_optimizer.ts, line 112 at r3 (raw file):

  setWeights(weightValues: NamedTensor[]): void {
    weightValues = super.setIterations(weightValues);

What happens if someone implements a custom optimizer and forgets to call setIterations()? Would be great to avoid forcing all optimizers to call setIterations(). Can you call setIterations() in the deserialization code?

src/optimizers/optimizer.ts, line 25 at r2 (raw file):

Previously, caisq (Shanqing Cai) wrote…

@nsthorat @dsmilkov

Note for reviewers: We talked about using the proper name of the tf.Variable objects. However, I found that to not work, the reason being more than one optimizer can be created for the same set of Variable, which would lead to a clash in variable names. But during optimization serialization, we don't want the deduplicating suffixes to show up in the names.

How about calling it OptimizerVariable?

src/optimizers/optimizer.ts, line 70 at r3 (raw file):

  }

  get iterations(): Variable {

add jsdoc since it's public api.

src/optimizers/sgd_optimizer_test.ts, line 71 at r3 (raw file):

    weights = optimizer1.getWeights();
    // No iterations prior to applyGradients() call.

is this comment relevant here?

src/optimizers/sgd_optimizer_test.ts, line 78 at r3 (raw file):

    const optimizer2 = tf.train.sgd(learningRate);
    optimizer2.setWeights(weights);
    expectArraysClose(await optimizer2.iterations.data(), 1);

can you test also getsWeights() after calling setWeights

dsmilkov

Reviewable status: 0 of 1 approvals obtained (waiting on @caisq, @dsmilkov, and @nsthorat)

src/optimizers/adagrad_optimizer.ts, line 80 at r3 (raw file):

  dispose(): void {
    super.dispose();

(Not for this PR) A general comment on the problems with inheritance. Any implementer now has to remember to dispose the parent. It is easy to make a custom optimizer, forget to do this, and cause a mem leak. Ideally instead of inheritance, we would have an OptimizationDriver (that has the shared code), and it will contain a private member of an optimizer. that way dispose to the parent will always dispose the child.

caisq

Thanks for the review!

Reviewable status: 0 of 1 approvals obtained (waiting on @dsmilkov and @nsthorat)

src/gradients.ts, line 262 at r3 (raw file):

Previously, dsmilkov (Daniel Smilkov) wrote…

typo: The map

Done. Changed to 'a map'.

src/gradients.ts, line 264 at r3 (raw file):

Previously, dsmilkov (Daniel Smilkov) wrote…

s/untrainable/non-trainable/ for consistency

Done. Corrected one other place in types.ts as well.

src/io/types.ts, line 93 at r3 (raw file):

Previously, dsmilkov (Daniel Smilkov) wrote…

typo: optional

Done.

src/io/types.ts, line 96 at r3 (raw file):

Previously, dsmilkov (Daniel Smilkov) wrote…

instead of "not the proper part of a model", how about "not used at inference time"

Done.

src/io/types.ts, line 124 at r3 (raw file):

Previously, dsmilkov (Daniel Smilkov) wrote…

typo: default

Done.

src/optimizers/adadelta_optimizer.ts, line 112 at r3 (raw file):

Previously, dsmilkov (Daniel Smilkov) wrote…

What happens if someone implements a custom optimizer and forgets to call setIterations()? Would be great to avoid forcing all optimizers to call setIterations(). Can you call setIterations() in the deserialization code?

See my reply below.

src/optimizers/adagrad_optimizer.ts, line 80 at r3 (raw file):

Previously, dsmilkov (Daniel Smilkov) wrote…

(Not for this PR) A general comment on the problems with inheritance. Any implementer now has to remember to dispose the parent. It is easy to make a custom optimizer, forget to do this, and cause a mem leak. Ideally instead of inheritance, we would have an OptimizationDriver (that has the shared code), and it will contain a private member of an optimizer. that way dispose to the parent will always dispose the child.

I made the following change which hopefully addresses this concern: I changed iterations from a variable to a number. It leads to the following improvements:

No need to call super.dispose() in subclasss dispose() anymore. Prevents the risk of memory leak.
It saves (a small number) of tensor operations during each applyGradient() or minimize() call.

For authors of custom optimizer do not have to worry about iterations unless they care about it. They can save the weights of their custom optimizer without the iterations "weight" in the front, so there is no need to call super.getWeights() during saving and super.setIterations() during weigths loading. If they care about it, they can follow the pattern in these pre-made optimizers. So at this point, there is no requirement to call super methods in custom optimizers.

src/optimizers/optimizer.ts, line 25 at r2 (raw file):

Previously, dsmilkov (Daniel Smilkov) wrote…

How about calling it OptimizerVariable?

Done. That makes sense. I also added a doc string here to clarify why this interface is needed on top of Variable.

src/optimizers/optimizer.ts, line 70 at r3 (raw file):

Previously, dsmilkov (Daniel Smilkov) wrote…

add jsdoc since it's public api.

Done.

src/optimizers/sgd_optimizer_test.ts, line 71 at r3 (raw file):

Previously, dsmilkov (Daniel Smilkov) wrote…

is this comment relevant here?

Added missing assertions.

src/optimizers/sgd_optimizer_test.ts, line 78 at r3 (raw file):

Previously, dsmilkov (Daniel Smilkov) wrote…

can you test also getsWeights() after calling setWeights

Added lines to test that, for both sgd and rmsprop.

dsmilkov

Reviewed 3 of 22 files at r2, 14 of 17 files at r4.
Reviewable status: 0 of 1 approvals obtained (waiting on @caisq and @nsthorat)

src/io/io_utils_test.ts, line 231 at r5 (raw file):

      {name: 'b41', tensor: tensor1d([-1.3, -3.7, 1.3, 3.7])}
    ];
    tf.io.encodeWeights(tensors)

use await instead of .then() (easier to read). Here and below.

src/optimizers/adagrad_optimizer.ts, line 80 at r3 (raw file):

Previously, caisq (Shanqing Cai) wrote…

I made the following change which hopefully addresses this concern: I changed iterations from a variable to a number. It leads to the following improvements:

No need to call super.dispose() in subclasss dispose() anymore. Prevents the risk of memory leak.

It saves (a small number) of tensor operations during each applyGradient() or minimize() call.

For authors of custom optimizer do not have to worry about iterations unless they care about it. They can save the weights of their custom optimizer without the iterations "weight" in the front, so there is no need to call super.getWeights() during saving and super.setIterations() during weigths loading. If they care about it, they can follow the pattern in these pre-made optimizers. So at this point, there is no requirement to call super methods in custom optimizers.

Thank you for improving this!

src/optimizers/optimizer.ts, line 154 at r4 (raw file):

   */
  protected setIterations(weightValues: NamedTensor[]): NamedTensor[] {
    this.iterations_ = weightValues[0].tensor.dataSync()[0];

Hmmm... New backends (e.g. wasm and webgpu) don't support dataSync(). Is it possible to move the logic of taking the first element of the weights futher up the stack, when serialization/deserialization happens?

src/optimizers/optimizer.ts, line 123 at r5 (raw file):

   */
  dispose(): void {
    if (this.iterations_ != null) {

this is a number now, so no need for disposal

dsmilkov

Reviewable status: 0 of 1 approvals obtained (waiting on @caisq and @nsthorat)

src/optimizers/optimizer.ts, line 154 at r4 (raw file):

Previously, dsmilkov (Daniel Smilkov) wrote…

Hmmm... New backends (e.g. wasm and webgpu) don't support dataSync(). Is it possible to move the logic of taking the first element of the weights futher up the stack, when serialization/deserialization happens?

Chatted offline. setWeights and getWeights will become async

src/optimizers/rmsprop_optimizer.ts, line 152 at r5 (raw file):

      variables.push(...this.accumulatedMeanGrads);
    }
    return super.getWeights().concat(

let's have this be [this.getIterationScalar()].concat(myWeights) which parallels this.incrementIterations()

…ave-optimizers

caisq

Reviewable status: 0 of 1 approvals obtained (waiting on @caisq and @nsthorat)

src/optimizers/optimizer.ts, line 154 at r4 (raw file):

Previously, dsmilkov (Daniel Smilkov) wrote…

Chatted offline. setWeights and getWeights will become async

Done. setWeights() and getWeights() are both offline.
So are the renamed saveIterations() and extractIterations() in the base class.

Now, none of the concrete Optimizer classes need to call super methods. They all call methods such as this.saveIterations() and this.extractIterations().

Also as discussed offline, I slightly revised AdamOptimizer in order to not lose the accBeta1 and accBeta2 state after saving and loading.

src/optimizers/rmsprop_optimizer.ts, line 152 at r5 (raw file):

Previously, dsmilkov (Daniel Smilkov) wrote…

let's have this be [this.getIterationScalar()].concat(myWeights) which parallels this.incrementIterations()

Done. I named the method saveIterations().

caisq · 2019-05-03T03:02:38Z

s/are both offline/are both async/

dsmilkov

Thanks for the discussions yesterday! - I did some more searching about the motivation for iterations -- seems like all tf.keras optimizers support rate_decay (including sgd) which use iterations (this allows us to later update our optimizers). Really nice work!

Reviewed 2 of 8 files at r5, 14 of 14 files at r6.
Reviewable status: complete! 1 of 1 approvals obtained (waiting on @caisq and @nsthorat)

src/optimizers/rmsprop_optimizer.ts, line 152 at r5 (raw file):

Previously, caisq (Shanqing Cai) wrote…

Done. I named the method saveIterations().

SGTM.

FEATURE - Currently defaulting `includeOptimizers` to `false`, which is completely backward compatible. We can decide on whether we'll change it to `true` for certain environments later. - Allow loss names to be python-style snake case. Towards: tensorflow/tfjs#83 Depends on tensorflow/tfjs-core#1714

caisq added 9 commits April 27, 2019 23:57

Support saving optimizers

c80ccfc

save

e68456b

Worked on Adadelta

edd0296

Add unit test for RMSPropOptimzier getWeights() and setWeights()

828c017

Adjust weight order in RMSPropOptimizer

d00a018

Add unit tests for AdadeltaOptimizer

d8c7e6f

Add WeightType

3f84ceb

Adjust RMSProp class name to align with TF v2

f1c8521

Update SGD optimizer

02a02d9

caisq mentioned this pull request Apr 29, 2019

Enable saving models with optimizer tensorflow/tfjs-layers#532

Merged

caisq added 16 commits April 29, 2019 10:05

save

5699904

Fix unit tests

9def0e7

Fix Adam

0b284d2

Add unit tests for Adam

7a72bfa

Remove NamedVariable interface

602e379

Add iterations to saved weights

716c091

Worked on Adagrad

0413be8

Revise adagrad class name

76b9ad7

Use forEach

3e1ae22

Make Optimizer.applyGradients handle null gradient values

7a788a4

- variableGrads(): If a list of variables is specified, the non-trainable ones should lead to explicit null entries in the return value.

Working Adamax

b875594

save

13c0284

Worked on Momentum

ac20c55

Add VariableWithOriginalName

c8fabbb

Merge branch 'master' into save-optimizers

a673850

Fix tests

ef5572d

caisq requested review from nsthorat and dsmilkov May 1, 2019 14:06

caisq commented May 1, 2019

View reviewed changes

Clean up

b52d65c

dsmilkov suggested changes May 1, 2019

View reviewed changes

dsmilkov reviewed May 1, 2019

View reviewed changes

caisq added 2 commits May 2, 2019 12:00

Merge branch 'master' into save-optimizers

8bc46ab

Address some comments

a9543e1

caisq commented May 2, 2019

View reviewed changes

caisq added 3 commits May 2, 2019 13:18

Save

8dd4593

Fix unit tests

eef25af

Merge branch 'master' into save-optimizers

d9c3972

dsmilkov reviewed May 2, 2019

View reviewed changes

caisq added 7 commits May 2, 2019 17:30

WIP

5c5746f

Merge branch 'save-optimizers' of github.com:caisq/deeplearnjs into s…

f64dd6d

…ave-optimizers

Make setWeights() async

36bd8e0

Make getWeights() async

727d51a

async saveIterations

1e2073f

extractIterations()

2a7c14e

Fix AdamOptimizer accumulated beta1 and beta2 state

9254be7

caisq commented May 3, 2019

View reviewed changes

caisq added 2 commits May 2, 2019 23:05

Merge branch 'master' into save-optimizers

0265364

Merge branch 'master' into save-optimizers

317b362

dsmilkov approved these changes May 3, 2019

View reviewed changes

Merge branch 'master' into save-optimizers

1ec1e14

caisq merged commit 28fd404 into tensorflow:master May 3, 2019

caisq deleted the save-optimizers branch May 3, 2019 15:05

davidsoergel mentioned this pull request Jul 29, 2019

Fix serialized classNames for Optimizers #1861

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Save optimizers #1714

Save optimizers #1714

Uh oh!

caisq commented Apr 29, 2019 •

edited

Loading

Uh oh!

caisq left a comment

Uh oh!

dsmilkov left a comment

Uh oh!

dsmilkov left a comment

Uh oh!

caisq left a comment

Uh oh!

dsmilkov left a comment

Uh oh!

dsmilkov left a comment

Uh oh!

caisq left a comment

Uh oh!

caisq commented May 3, 2019

Uh oh!

dsmilkov left a comment

Uh oh!

Uh oh!

Save optimizers #1714

Save optimizers #1714

Uh oh!

Conversation

caisq commented Apr 29, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

caisq left a comment

Choose a reason for hiding this comment

Uh oh!

dsmilkov left a comment

Choose a reason for hiding this comment

Uh oh!

dsmilkov left a comment

Choose a reason for hiding this comment

Uh oh!

caisq left a comment

Choose a reason for hiding this comment

Uh oh!

dsmilkov left a comment

Choose a reason for hiding this comment

Uh oh!

dsmilkov left a comment

Choose a reason for hiding this comment

Uh oh!

caisq left a comment

Choose a reason for hiding this comment

Uh oh!

caisq commented May 3, 2019

Uh oh!

dsmilkov left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

caisq commented Apr 29, 2019 •

edited

Loading