-
Notifications
You must be signed in to change notification settings - Fork 943
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add some unit tests for this?
Thanks for the awesome PR!
Reviewable status: 0 of 1 approvals obtained (waiting on @nsthorat and @VariableVasasMT)
src/ops/dropout.ts, line 1 at r1 (raw file):
import {keep, tidy} from '../globals';
add the license at the top of this file
src/ops/dropout.ts, line 20 at r1 (raw file):
Previously, VariableVasasMT wrote…
moved this to core ops as well, I might move it to separate file
We dont need a scalar cache here at all.
src/ops/dropout.ts, line 33 at r1 (raw file):
/** * Sets entries in `x` to zero at random, while scaling the entire tensor. *
a code snippet would be great :)
src/ops/dropout.ts, line 41 at r1 (raw file):
* @returns Result of the dropout operation. */
remove this empty line
src/ops/dropout.ts, line 44 at r1 (raw file):
function dropout_( x: Tensor, level: Scalar, noiseShape?: number[], seed?: number): Tensor { return tidy(() => {
remove this tidy, the op() wrapper does that
src/ops/dropout.ts, line 45 at r1 (raw file):
x: Tensor, level: Scalar, noiseShape?: number[], seed?: number): Tensor { return tidy(() => { // TODO(cais): Switch to deeplearn.js implementation of dropout when it
remove this comment
src/ops/dropout.ts, line 56 at r1 (raw file):
} let multiplier = step( add(neg(level) as Scalar, randomUniform(x.shape, 0, 1, 'float32')));
step(sub(randomUniform(x.shape, 0, 1, 'float32'), level))
src/ops/dropout.ts, line 59 at r1 (raw file):
// Scale the kept elements, so the expected sum is unchanged. multiplier = mul(div(getScalar(1), sub(getScalar(1), level)) as Scalar, multiplier);
multipler = (div(1, sub(1, level)) as Scalar).mul(multipler)
src/ops/dropout.ts, line 60 at r1 (raw file):
multiplier = mul(div(getScalar(1), sub(getScalar(1), level)) as Scalar, multiplier); return mul(x, multiplier);
return x.mul(multipler);
src/ops/dropout.ts, line 65 at r1 (raw file):
export const dropout = op({dropout_}); export const getScalar = op({getScalar_});
remove this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 approvals obtained (waiting on @VariableVasasMT)
src/ops/dropout.ts, line 56 at r1 (raw file):
Previously, nsthorat (Nikhil Thorat) wrote…
step(sub(randomUniform(x.shape, 0, 1, 'float32'), level))
I have been reading two days, but can't understand the reason for add(neg(level) or just level, could you please point me to a source to read about it, so that I can understand better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will be adding tests soon.
Reviewable status: 0 of 1 approvals obtained (waiting on @VariableVasasMT)
src/ops/dropout.ts, line 56 at r1 (raw file):
Previously, VariableVasasMT (VariableVasas) wrote…
I have been reading two days, but can't understand the reason for add(neg(level) or just level, could you please point me to a source to read about it, so that I can understand better.
I read http://www.cs.toronto.edu/~rsalakhu/papers/srivastava14a.pdf and found the random array generation uses Bernoulli random distribution, after reading the code here https://github.com/numpy/numpy/blob/v1.13.3/numpy/random/mtrand/distributions.c#L442, I know f(X=k) = (p^k)*((1-p)^(n-k)), I get randomUniform generates the numbers as Multiplier but I dont understand the sub() and step() application.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 3 files at r2.
Reviewable status: 0 of 1 approvals obtained (waiting on @nsthorat)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 approvals obtained (waiting on @nsthorat)
src/ops/dropout.ts, line 56 at r1 (raw file):
Previously, VariableVasasMT (VariableVasas) wrote…
I read http://www.cs.toronto.edu/~rsalakhu/papers/srivastava14a.pdf and found the random array generation uses Bernoulli random distribution, after reading the code here https://github.com/numpy/numpy/blob/v1.13.3/numpy/random/mtrand/distributions.c#L442, I know f(X=k) = (p^k)*((1-p)^(n-k)), I get randomUniform generates the numbers as Multiplier but I dont understand the sub() and step() application.
I think the math behind dropout is very simple. The level
parameter is just a Bernoulli probability of an element of the tensor being set to 0. By doing sub
with a [0, 1]-uniform distributed set of numbers and level
, we make a level
fraction of the elements negative, in expectation. Then the step
sets the value of those negative elements to zero and the rest to one.
See the implementation in tensorflow (python) , which uses floor
, but is mathematically equivalent:
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/ops/nn_ops.py#L2323
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 approvals obtained
src/ops/dropout.ts, line 56 at r1 (raw file):
Previously, caisq (Shanqing Cai) wrote…
I think the math behind dropout is very simple. The
level
parameter is just a Bernoulli probability of an element of the tensor being set to 0. By doingsub
with a [0, 1]-uniform distributed set of numbers andlevel
, we make alevel
fraction of the elements negative, in expectation. Then thestep
sets the value of those negative elements to zero and the rest to one.See the implementation in tensorflow (python) , which uses
floor
, but is mathematically equivalent:
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/ops/nn_ops.py#L2323
Thank you, I really appreciate it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r1.
Reviewable status: 0 of 1 approvals obtained
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r3.
Reviewable status: 0 of 1 approvals obtained
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 3 files at r2, 1 of 1 files at r3.
Reviewable status: 0 of 1 approvals obtained
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 approvals obtained (waiting on @VariableVasasMT)
src/ops/dropout.ts, line 41 at r3 (raw file):
unction dropout_(
What will be the public namespace of this new function? TensorFlow (Python) has tf.nn.dropout
. Should we try to match that here, @nsthorat ?
src/ops/dropout.ts, line 44 at r3 (raw file):
TODO: implement non default noise shape (VariableVasasMT)
nit: Use TODO formatting: TODO(VariableVasasMT): ...
src/ops/dropout.ts, line 54 at r3 (raw file):
div(1, sub(1, level)) as Scalar).mul(multiplier)
Can this be simplified as multiplier.div(sub(1, level) as Scalar)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 approvals obtained (waiting on @VariableVasasMT and @caisq)
src/ops/dropout.ts, line 44 at r3 (raw file):
Previously, caisq (Shanqing Cai) wrote…
TODO: implement non default noise shape (VariableVasasMT)
nit: Use TODO formatting: TODO(VariableVasasMT): ...
Done.
src/ops/dropout.ts, line 54 at r3 (raw file):
Previously, caisq (Shanqing Cai) wrote…
div(1, sub(1, level)) as Scalar).mul(multiplier)
Can this be simplified as multiplier.div(sub(1, level) as Scalar)?
I think multiplying a scalar to all the tensor indexes is cheaper than division in terms of computational cost, I think we should stick with multiplication, specially with floating point calculations, I know its an unnecessary optimization, but looking at the usage of the library, I think we should be careful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 approvals obtained (waiting on @VariableVasasMT and @caisq)
src/ops/dropout.ts, line 54 at r3 (raw file):
Previously, VariableVasasMT (VariableVasas) wrote…
I think multiplying a scalar to all the tensor indexes is cheaper than division in terms of computational cost, I think we should stick with multiplication, specially with floating point calculations, I know its an unnecessary optimization, but looking at the usage of the library, I think we should be careful.
Why would this be cheaper? In this implementation we have an extra multiplication shader, which is an extra draw call. I agree with @caisq request :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r4.
Reviewable status: 0 of 1 approvals obtained (waiting on @VariableVasasMT and @caisq)
src/ops/dropout.ts, line 54 at r3 (raw file):
Previously, nsthorat (Nikhil Thorat) wrote…
Why would this be cheaper? In this implementation we have an extra multiplication shader, which is an extra draw call. I agree with @caisq request :)
fair enought, I thought it would be cheaper because in the end we are either multiplying or dividing each index of a tensor, const res = dy.div($b.toFloat());, so divide once and multiply again and again might be cheaper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 approvals obtained (waiting on @caisq)
src/ops/dropout.ts, line 41 at r3 (raw file):
Previously, caisq (Shanqing Cai) wrote…
unction dropout_(
What will be the public namespace of this new function? TensorFlow (Python) has
tf.nn.dropout
. Should we try to match that here, @nsthorat ?
@nsthorat I dont have any idea about this, its your call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 approvals obtained (waiting on @caisq)
src/ops/dropout.ts, line 54 at r3 (raw file):
Previously, VariableVasasMT (VariableVasas) wrote…
fair enought, I thought it would be cheaper because in the end we are either multiplying or dividing each index of a tensor, const res = dy.div($b.toFloat());, so divide once and multiply again and again might be cheaper.
I will test is for large tensors and see.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 approvals obtained (waiting on @caisq)
src/ops/dropout.ts, line 54 at r3 (raw file):
Previously, VariableVasasMT (VariableVasas) wrote…
I will test is for large tensors and see.
*************** with tensor x = tf.range(1, 4000001).reshape([2000000, 2]); *******************
experiment with:
multiplier = multiplier.div(sub(1, level) as Scalar);
Chrome 70.0.3538 (Linux 0.0.0) SLOW 1.043 secs: ModelManagement test-webgl2 {"WEBGL_VERSION":2} Failed copyModel
LOG: 351
Chrome 70.0.3538 (Linux 0.0.0) SLOW 3.833 secs: dropout test-webgl1 {"WEBGL_VERSION":1} Level = 0
LOG: 333
Chrome 70.0.3538 (Linux 0.0.0) SLOW 1.472 secs: dropout test-webgl1 {"WEBGL_VERSION":1} Level = 0.75
LOG: 201
Chrome 70.0.3538 (Linux 0.0.0) SLOW 3.306 secs: dropout test-webgl2 {"WEBGL_VERSION":2} Level = 0
LOG: 117
Chrome 70.0.3538 (Linux 0.0.0) SLOW 1.218 secs: dropout test-webgl2 {"WEBGL_VERSION":2} Level = 0.75
LOG: 510
Chrome 70.0.3538 (Linux 0.0.0) SLOW 10.455 secs: dropout test-cpu {"HAS_WEBGL":false} Level = 0
LOG: 1398
experiment with:
multiplier = (div(1, sub(1, level)) as Scalar).mul(multiplier);
LOG: 357
Chrome 70.0.3538 (Linux 0.0.0) SLOW 3.954 secs: dropout test-webgl1 {"WEBGL_VERSION":1} Level = 0
LOG: 306
Chrome 70.0.3538 (Linux 0.0.0) SLOW 1.543 secs: dropout test-webgl1 {"WEBGL_VERSION":1} Level = 0.75
LOG: 200
Chrome 70.0.3538 (Linux 0.0.0) SLOW 3.08 secs: dropout test-webgl2 {"WEBGL_VERSION":2} Level = 0
LOG: 117
Chrome 70.0.3538 (Linux 0.0.0) SLOW 1.247 secs: dropout test-webgl2 {"WEBGL_VERSION":2} Level = 0.75
LOG: 488
Chrome 70.0.3538 (Linux 0.0.0) SLOW 3.151 secs: dropout test-cpu {"HAS_WEBGL":false} Level = 0
LOG: 965
Chrome 70.0.3538 (Linux 0.0.0) SLOW 1.775 secs: dropout test-cpu {"HAS_WEBGL":false} Level = 0.75
***************** with tensor x = tf.range(1, 8000001).reshape([4000000, 2]); *****************
experiment with:
multiplier = multiplier.div(sub(1, level) as Scalar);
Chrome 70.0.3538 (Linux 0.0.0) SLOW 7.747 secs: dropout test-webgl1 {"WEBGL_VERSION":1} Level = 0
LOG: 584
Chrome 70.0.3538 (Linux 0.0.0) SLOW 3.039 secs: dropout test-webgl1 {"WEBGL_VERSION":1} Level = 0.75
LOG: 309
Chrome 70.0.3538 (Linux 0.0.0) SLOW 6.898 secs: dropout test-webgl2 {"WEBGL_VERSION":2} Level = 0
LOG: 252
Chrome 70.0.3538 (Linux 0.0.0) SLOW 3.061 secs: dropout test-webgl2 {"WEBGL_VERSION":2} Level = 0.75
LOG: 1542
Chrome 70.0.3538 (Linux 0.0.0) SLOW 11.635 secs: dropout test-cpu {"HAS_WEBGL":false} Level = 0
LOG: 4030
Chrome 70.0.3538 (Linux 0.0.0) SLOW 9.814 secs: dropout test-cpu {"HAS_WEBGL":false} Level = 0.75
|||||||||||||||||||||||||||||||||||||||||||||||||||||
LOG: 599
Chrome 70.0.3538 (Linux 0.0.0) SLOW 6.615 secs: dropout test-webgl1 {"WEBGL_VERSION":1} Level = 0
LOG: 542
Chrome 70.0.3538 (Linux 0.0.0) SLOW 2.858 secs: dropout test-webgl1 {"WEBGL_VERSION":1} Level = 0.75
LOG: 297
Chrome 70.0.3538 (Linux 0.0.0) SLOW 6.354 secs: dropout test-webgl2 {"WEBGL_VERSION":2} Level = 0
LOG: 259
Chrome 70.0.3538 (Linux 0.0.0) SLOW 2.358 secs: dropout test-webgl2 {"WEBGL_VERSION":2} Level = 0.75
LOG: 1122
Chrome 70.0.3538 (Linux 0.0.0) SLOW 10.62 secs: dropout test-cpu {"HAS_WEBGL":false} Level = 0
LOG: 4091
Chrome 70.0.3538 (Linux 0.0.0) SLOW 11.404 secs: dropout test-cpu {"HAS_WEBGL":false} Level = 0.75
experiment with:
multiplier = (div(1, sub(1, level)) as Scalar).mul(multiplier);
LOG: 597
Chrome 70.0.3538 (Linux 0.0.0) SLOW 6.719 secs: dropout test-webgl1 {"WEBGL_VERSION":1} Level = 0
LOG: 554
Chrome 70.0.3538 (Linux 0.0.0) SLOW 2.892 secs: dropout test-webgl1 {"WEBGL_VERSION":1} Level = 0.75
LOG: 311
Chrome 70.0.3538 (Linux 0.0.0) SLOW 6.256 secs: dropout test-webgl2 {"WEBGL_VERSION":2} Level = 0
LOG: 248
Chrome 70.0.3538 (Linux 0.0.0) SLOW 3.131 secs: dropout test-webgl2 {"WEBGL_VERSION":2} Level = 0.75
LOG: 1484
Chrome 70.0.3538 (Linux 0.0.0) SLOW 12.263 secs: dropout test-cpu {"HAS_WEBGL":false} Level = 0
LOG: 3697
Chrome 70.0.3538 (Linux 0.0.0) SLOW 9.338 secs: dropout test-cpu {"HAS_WEBGL":false} Level = 0.75
|||||||||||||||||||||||||||||||||||||||||||||||||||||
LOG: 606
Chrome 70.0.3538 (Linux 0.0.0) SLOW 7.311 secs: dropout test-webgl1 {"WEBGL_VERSION":1} Level = 0
LOG: 537
Chrome 70.0.3538 (Linux 0.0.0) SLOW 2.881 secs: dropout test-webgl1 {"WEBGL_VERSION":1} Level = 0.75
LOG: 307
Chrome 70.0.3538 (Linux 0.0.0) SLOW 6.488 secs: dropout test-webgl2 {"WEBGL_VERSION":2} Level = 0
LOG: 248
Chrome 70.0.3538 (Linux 0.0.0) SLOW 2.539 secs: dropout test-webgl2 {"WEBGL_VERSION":2} Level = 0.75
LOG: 1111
Chrome 70.0.3538 (Linux 0.0.0) SLOW 10.807 secs: dropout test-cpu {"HAS_WEBGL":false} Level = 0
LOG: 3456
Chrome 70.0.3538 (Linux 0.0.0) SLOW 9.003 secs: dropout test-cpu {"HAS_WEBGL":false} Level = 0.75
So it is equivalent with @caisq for larger tensors and adds to code readability. I guess I was wrong, I will make the changes ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 1 of 1 approvals obtained (waiting on @caisq and @nsthorat)
a discussion (no related file):
The branch is updated on github, but reviewable shows Pull request branch is out-of-date with target branch and cannot be merged
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 1 of 1 approvals obtained (waiting on @caisq and @nsthorat)
a discussion (no related file):
Previously, VariableVasasMT (VariableVasas) wrote…
The branch is updated on github, but reviewable shows Pull request branch is out-of-date with target branch and cannot be merged
@dsmilkov please review, I have applied all the changes you asked
@dsmilkov please review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r19.
Reviewable status:complete! 1 of 1 approvals obtained (waiting on @caisq and @nsthorat)
@dsmilkov please review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the great contribution! And sorry for the delay. This is good to go! Wohoo!
Reviewed 1 of 1 files at r15, 1 of 1 files at r19.
Reviewable status:complete! 2 of 1 approvals obtained (waiting on @caisq and @nsthorat)
src/ops/mersenne-rand.ts, line 1 at r10 (raw file):
Previously, VariableVasasMT (VariableVasas) wrote…
this file is deleted, but somehow still showing here, I think reviewable is displaying it deleted
Sounds good!
One quick ask: the build fails since you added a new unit test and now have to re-run |
@dsmilkov fixed, the link you gave only accepts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r21.
Reviewable status:complete! 2 of 1 approvals obtained (waiting on @caisq and @nsthorat)
Change This PR cleans up a TODO. As dropout op has been checked in through PR [tensorflow/tfjs-core#1343](tensorflow/tfjs-core#1343) and released, this PR safely switches `K.dropout` implementation to tfjs-core's `dropout` op. The switch of `K.dropout` implementation would benefit further dropout op related development. Some changes in switching: * As `tfc.dropout` op would assert `NotImplementedError` for `noiseShape`, removed the assertion in `K.dropout`. * As `seed` has already supported in `tf.randomUniform`, removed `NotImplementedError` for it. * The test cases for `SimpleRNN Tensor`, `GRU Tensor`, `LSTM Tensor` spy on function `tfc.randomUniform` which is used in K.dropout before, as the implementation with `tfc.randomUniform` is gone, replaced spy on `tfc.randomUniform` with spy on `tfc.dropout` which has the same called times. * When RNN cell calling the `K.dropout`, it will create an anonymous Tensor, for example, [src/layers/recurrent.ts#L2563](https://github.com/tensorflow/tfjs-layers/blob/master/src/layers/recurrent.ts#L2563), we can not track it, so removed the `toHaveBeenCalledWith` check. DEV
This PR is a followup PR for [tfjs-core#1343](#1343) which added `dropout` op, dropout feature initially requested in [tfjs#117](tensorflow/tfjs#117). This PR cleans up a TODO, `implements non default noise shape`. The implementation of noiseShape aligns with TensorFlow Python's dropout API. The `noiseShape` feature would benefit further dropout related development, for example, make `tf.layers.dropout` support `noiseShape` configuration. **Relative PR:** * Make dropout layer support non default noiseShape and seed [tensorflow/tfjs-layers#556](tensorflow/tfjs-layers#556) **Reference:** * [TensorFlow tf.nn.dropout documentation](https://www.tensorflow.org/api_docs/python/tf/nn/dropout) * [TensorFlow tf.nn.dropout implementation](https://github.com/tensorflow/tensorflow/blob/r1.13/tensorflow/python/ops/nn_ops.py#L2982-L3054) * [TensorFlow _get_noise_shape implementation](https://github.com/tensorflow/tensorflow/blob/r1.13/tensorflow/python/ops/nn_ops.py#L2903-L2925)
Description
Added dropout to ops and moved getScalar to ops of core as well. Will add another pull request to implement non default noiseShape.
Fixes tensorflow/tfjs#117
For repository owners only:
Please remember to apply all applicable tags to your pull request.
Tags: FEATURE, BREAKING, BUG, PERF, DEV, DOC, SECURITY
For more info see: https://github.com/tensorflow/tfjs/blob/master/DEVELOPMENT.md
This change is