-
Notifications
You must be signed in to change notification settings - Fork 943
Speedup matmul when the matrices are vectors #1379
Conversation
Hold on for a bit, I'm going to generalize this further for any matrix-vector, vector-matrix product based on the excellent observation of @DirkToewe in tensorflow/tfjs#881 (comment) |
This is ready for review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 approvals obtained (waiting on @dsmilkov, @nsthorat, and @annxingyuan)
src/ops/matmul_test.ts, line 293 at r2 (raw file):
}); it('batched matmul with the matrices being vectors', () => {
test for matrix x vector and vise versa?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r2.
Reviewable status:complete! 1 of 1 approvals obtained (waiting on @dsmilkov and @annxingyuan)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 1 of 1 approvals obtained
src/ops/matmul_test.ts, line 293 at r2 (raw file):
Previously, nsthorat (Nikhil Thorat) wrote…
test for matrix x vector and vise versa?
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! Will be cool to see whether any users of models in the wild report speedups.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 2 of 1 approvals obtained (waiting on @dsmilkov)
src/ops/matmul_test.ts, line 322 at r3 (raw file):
expectArraysClose(result, [100, 100, 100, 100, 100, 100]); });
This change has broken the travis build and needs to be looked into ( @dsmilkov )
This is specifically for safari.
Safari 11.1.2 (Mac OS X 10.13.6) matmul cpu {"HAS_WEBGL":false} batched matmul with matrix x vector FAILED
Error: Arrays differ: actual[1] = 99.89019775390625, expected[1] = 100.
Actual: 100.00019836425781,99.89019775390625,99.5501937866211,99.55020141601562,99.3102035522461,99.29019927978516.
Expected: 100,100,100,100,100,100. in src/test_util.js (line 71)
expectArraysClose@src/test_util.ts:100:10 <- src/test_util.js:71:28
src/ops/matmul_test.ts:310:22 <- src/ops/matmul_test.js:219:38
<Jasmine>
#1379 added a unit test that failed in travis due to numerical precision: https://travis-ci.org/tensorflow/tfjs-core/jobs/451654174#L801 This PR: - fixes that test - Also fixes `test-travis.sh` to make sure we don't forward webgl --> cpu except for the last browser run -- probably caused during conflict resolving of #1371 and #1372 (https://reviewable.io/reviews/tensorflow/tfjs-core/1371#-LQ_a2y92YYLRMiYWqhC) DEV
In the WebGL backend, we can parallelize dot products by calling
tf.mul(x,y).sum()
-- this is due to the fact that the WebGL's sum() uses divide-and-conquer and is O(sqrt(N)).Have matmul() in WebGL call
tf.mul(x, y).sum()
when we have matrix x vector, vector x matrix, or vector x vector and the shared dimension is longer than 1000 (see benchmark below).Fixes tensorflow/tfjs#881
Benchmark of
[1,K]x[K,1]
for differentK
s:=== OLD ===
LOG: 'K = 1, Took 0.72 ms'
LOG: 'K = 10, Took 0.67 ms'
LOG: 'K = 100, Took 0.59 ms'
LOG: 'K = 1000, Took 0.64 ms' --- slower after K>1000.
LOG: 'K = 10000, Took 1.18 ms'
LOG: 'K = 100000, Took 10.66 ms'
LOG: 'K = 1000000, Took 110.90 ms'
=== NEW ===
LOG: 'K = 1, Took 0.62 ms'
LOG: 'K = 10, Took 0.62 ms'
LOG: 'K = 100, Took 0.67 ms'
LOG: 'K = 1000, Took 0.67 ms' --- faster after K>1000.
LOG: 'K = 10000, Took 1.12 ms'
LOG: 'K = 100000, Took 0.88 ms'
LOG: 'K = 1000000, Took 0.94 ms'
Benchmarked mobilenet, coco-ssd, and posenet -- no perf impact there.
PERF
This change is