-
Notifications
You must be signed in to change notification settings - Fork 943
Conversation
- variableGrads(): If a list of variables is specified, the non-trainable ones should lead to explicit null entries in the return value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 approvals obtained (waiting on @dsmilkov and @nsthorat)
src/optimizers/optimizer.ts, line 25 at r2 (raw file):
import {NamedTensor, NamedTensorMap} from '../tensor_types'; export interface VariableWithOriginalName {
Note for reviewers: We talked about using the proper name of the tf.Variable
objects. However, I found that to not work, the reason being more than one optimizer can be created for the same set of Variable, which would lead to a clash in variable names. But during optimization serialization, we don't want the deduplicating suffixes to show up in the names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 approvals obtained (waiting on @caisq, @dsmilkov, and @nsthorat)
src/gradients.ts, line 262 at r3 (raw file):
* @returns An object with the following keys and values: * - `value`: The value of the function `f`. * - `grads`: The a map from the names of the variables to the gradients.
typo: The map
src/gradients.ts, line 264 at r3 (raw file):
* - `grads`: The a map from the names of the variables to the gradients. * If the `varList` argument is provided explicitly and contains a subset of * untrainable variables, this map in the return value will contain keys
s/untrainable/non-trainable/ for consistency
src/io/types.ts, line 93 at r3 (raw file):
* Type of the weight. * * Optinoal.
typo: optional
src/io/types.ts, line 96 at r3 (raw file):
* * The value 'optimizer' indicates the weight belongs to an optimizer * (i.e., not the proper part of a model).
instead of "not the proper part of a model", how about "not used at inference time"
src/io/types.ts, line 124 at r3 (raw file):
* Whether the optimizer will be saved (if exists). * * Deafult: `false`.
typo: default
src/optimizers/adadelta_optimizer.ts, line 112 at r3 (raw file):
setWeights(weightValues: NamedTensor[]): void { weightValues = super.setIterations(weightValues);
What happens if someone implements a custom optimizer and forgets to call setIterations()? Would be great to avoid forcing all optimizers to call setIterations(). Can you call setIterations() in the deserialization code?
src/optimizers/optimizer.ts, line 25 at r2 (raw file):
Previously, caisq (Shanqing Cai) wrote…
Note for reviewers: We talked about using the proper name of the
tf.Variable
objects. However, I found that to not work, the reason being more than one optimizer can be created for the same set of Variable, which would lead to a clash in variable names. But during optimization serialization, we don't want the deduplicating suffixes to show up in the names.
How about calling it OptimizerVariable?
src/optimizers/optimizer.ts, line 70 at r3 (raw file):
} get iterations(): Variable {
add jsdoc since it's public api.
src/optimizers/sgd_optimizer_test.ts, line 71 at r3 (raw file):
weights = optimizer1.getWeights(); // No iterations prior to applyGradients() call.
is this comment relevant here?
src/optimizers/sgd_optimizer_test.ts, line 78 at r3 (raw file):
const optimizer2 = tf.train.sgd(learningRate); optimizer2.setWeights(weights); expectArraysClose(await optimizer2.iterations.data(), 1);
can you test also getsWeights() after calling setWeights
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 approvals obtained (waiting on @caisq, @dsmilkov, and @nsthorat)
src/optimizers/adagrad_optimizer.ts, line 80 at r3 (raw file):
dispose(): void { super.dispose();
(Not for this PR) A general comment on the problems with inheritance. Any implementer now has to remember to dispose the parent. It is easy to make a custom optimizer, forget to do this, and cause a mem leak. Ideally instead of inheritance, we would have an OptimizationDriver (that has the shared code), and it will contain a private member of an optimizer. that way dispose to the parent will always dispose the child.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review!
Reviewable status: 0 of 1 approvals obtained (waiting on @dsmilkov and @nsthorat)
src/gradients.ts, line 262 at r3 (raw file):
Previously, dsmilkov (Daniel Smilkov) wrote…
typo: The map
Done. Changed to 'a map'.
src/gradients.ts, line 264 at r3 (raw file):
Previously, dsmilkov (Daniel Smilkov) wrote…
s/untrainable/non-trainable/ for consistency
Done. Corrected one other place in types.ts as well.
src/io/types.ts, line 93 at r3 (raw file):
Previously, dsmilkov (Daniel Smilkov) wrote…
typo: optional
Done.
src/io/types.ts, line 96 at r3 (raw file):
Previously, dsmilkov (Daniel Smilkov) wrote…
instead of "not the proper part of a model", how about "not used at inference time"
Done.
src/io/types.ts, line 124 at r3 (raw file):
Previously, dsmilkov (Daniel Smilkov) wrote…
typo: default
Done.
src/optimizers/adadelta_optimizer.ts, line 112 at r3 (raw file):
Previously, dsmilkov (Daniel Smilkov) wrote…
What happens if someone implements a custom optimizer and forgets to call setIterations()? Would be great to avoid forcing all optimizers to call setIterations(). Can you call setIterations() in the deserialization code?
See my reply below.
src/optimizers/adagrad_optimizer.ts, line 80 at r3 (raw file):
Previously, dsmilkov (Daniel Smilkov) wrote…
(Not for this PR) A general comment on the problems with inheritance. Any implementer now has to remember to dispose the parent. It is easy to make a custom optimizer, forget to do this, and cause a mem leak. Ideally instead of inheritance, we would have an OptimizationDriver (that has the shared code), and it will contain a private member of an optimizer. that way dispose to the parent will always dispose the child.
I made the following change which hopefully addresses this concern: I changed iterations from a variable to a number. It leads to the following improvements:
- No need to call
super.dispose()
in subclasssdispose()
anymore. Prevents the risk of memory leak. - It saves (a small number) of tensor operations during each
applyGradient()
orminimize()
call.
For authors of custom optimizer do not have to worry about iterations unless they care about it. They can save the weights of their custom optimizer without the iterations "weight" in the front, so there is no need to call super.getWeights()
during saving and super.setIterations()
during weigths loading. If they care about it, they can follow the pattern in these pre-made optimizers. So at this point, there is no requirement to call super methods in custom optimizers.
src/optimizers/optimizer.ts, line 25 at r2 (raw file):
Previously, dsmilkov (Daniel Smilkov) wrote…
How about calling it OptimizerVariable?
Done. That makes sense. I also added a doc string here to clarify why this interface is needed on top of Variable
.
src/optimizers/optimizer.ts, line 70 at r3 (raw file):
Previously, dsmilkov (Daniel Smilkov) wrote…
add jsdoc since it's public api.
Done.
src/optimizers/sgd_optimizer_test.ts, line 71 at r3 (raw file):
Previously, dsmilkov (Daniel Smilkov) wrote…
is this comment relevant here?
Added missing assertions.
src/optimizers/sgd_optimizer_test.ts, line 78 at r3 (raw file):
Previously, dsmilkov (Daniel Smilkov) wrote…
can you test also getsWeights() after calling setWeights
Added lines to test that, for both sgd and rmsprop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 22 files at r2, 14 of 17 files at r4.
Reviewable status: 0 of 1 approvals obtained (waiting on @caisq and @nsthorat)
src/io/io_utils_test.ts, line 231 at r5 (raw file):
{name: 'b41', tensor: tensor1d([-1.3, -3.7, 1.3, 3.7])} ]; tf.io.encodeWeights(tensors)
use await instead of .then() (easier to read). Here and below.
src/optimizers/adagrad_optimizer.ts, line 80 at r3 (raw file):
Previously, caisq (Shanqing Cai) wrote…
I made the following change which hopefully addresses this concern: I changed iterations from a variable to a number. It leads to the following improvements:
- No need to call
super.dispose()
in subclasssdispose()
anymore. Prevents the risk of memory leak.- It saves (a small number) of tensor operations during each
applyGradient()
orminimize()
call.For authors of custom optimizer do not have to worry about iterations unless they care about it. They can save the weights of their custom optimizer without the iterations "weight" in the front, so there is no need to call
super.getWeights()
during saving andsuper.setIterations()
during weigths loading. If they care about it, they can follow the pattern in these pre-made optimizers. So at this point, there is no requirement to call super methods in custom optimizers.
Thank you for improving this!
src/optimizers/optimizer.ts, line 154 at r4 (raw file):
*/ protected setIterations(weightValues: NamedTensor[]): NamedTensor[] { this.iterations_ = weightValues[0].tensor.dataSync()[0];
Hmmm... New backends (e.g. wasm and webgpu) don't support dataSync(). Is it possible to move the logic of taking the first element of the weights futher up the stack, when serialization/deserialization happens?
src/optimizers/optimizer.ts, line 123 at r5 (raw file):
*/ dispose(): void { if (this.iterations_ != null) {
this is a number now, so no need for disposal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 approvals obtained (waiting on @caisq and @nsthorat)
src/optimizers/optimizer.ts, line 154 at r4 (raw file):
Previously, dsmilkov (Daniel Smilkov) wrote…
Hmmm... New backends (e.g. wasm and webgpu) don't support dataSync(). Is it possible to move the logic of taking the first element of the weights futher up the stack, when serialization/deserialization happens?
Chatted offline. setWeights and getWeights will become async
src/optimizers/rmsprop_optimizer.ts, line 152 at r5 (raw file):
variables.push(...this.accumulatedMeanGrads); } return super.getWeights().concat(
let's have this be [this.getIterationScalar()].concat(myWeights)
which parallels this.incrementIterations()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 approvals obtained (waiting on @caisq and @nsthorat)
src/optimizers/optimizer.ts, line 154 at r4 (raw file):
Previously, dsmilkov (Daniel Smilkov) wrote…
Chatted offline. setWeights and getWeights will become async
Done. setWeights()
and getWeights()
are both offline.
So are the renamed saveIterations()
and extractIterations()
in the base class.
Now, none of the concrete Optimizer classes need to call super methods. They all call methods such as this.saveIterations()
and this.extractIterations()
.
Also as discussed offline, I slightly revised AdamOptimizer in order to not lose the accBeta1
and accBeta2
state after saving and loading.
src/optimizers/rmsprop_optimizer.ts, line 152 at r5 (raw file):
Previously, dsmilkov (Daniel Smilkov) wrote…
let's have this be
[this.getIterationScalar()].concat(myWeights)
which parallelsthis.incrementIterations()
Done. I named the method saveIterations()
.
s/are both offline/are both async/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the discussions yesterday! - I did some more searching about the motivation for iterations
-- seems like all tf.keras optimizers support rate_decay (including sgd) which use iterations
(this allows us to later update our optimizers). Really nice work!
Reviewed 2 of 8 files at r5, 14 of 14 files at r6.
Reviewable status:complete! 1 of 1 approvals obtained (waiting on @caisq and @nsthorat)
src/optimizers/rmsprop_optimizer.ts, line 152 at r5 (raw file):
Previously, caisq (Shanqing Cai) wrote…
Done. I named the method
saveIterations()
.
SGTM.
FEATURE - Currently defaulting `includeOptimizers` to `false`, which is completely backward compatible. We can decide on whether we'll change it to `true` for certain environments later. - Allow loss names to be python-style snake case. Towards: tensorflow/tfjs#83 Depends on tensorflow/tfjs-core#1714
FEATURE
className
s of the optimizers align with TensorFlow (Python)getWeights()
for weight savingsetWeights()
for weight loadingTowards tensorflow/tfjs#83
This change is