Skip to content
This repository was archived by the owner on Aug 15, 2019. It is now read-only.

Speedup matmul when the matrices are vectors #1379

Merged
merged 8 commits into from
Nov 6, 2018
Merged

Speedup matmul when the matrices are vectors #1379

merged 8 commits into from
Nov 6, 2018

Conversation

dsmilkov
Copy link
Contributor

@dsmilkov dsmilkov commented Nov 5, 2018

In the WebGL backend, we can parallelize dot products by calling tf.mul(x,y).sum() -- this is due to the fact that the WebGL's sum() uses divide-and-conquer and is O(sqrt(N)).

Have matmul() in WebGL call tf.mul(x, y).sum() when we have matrix x vector, vector x matrix, or vector x vector and the shared dimension is longer than 1000 (see benchmark below).

Fixes tensorflow/tfjs#881

Benchmark of [1,K]x[K,1] for different Ks:
=== OLD ===
LOG: 'K = 1, Took 0.72 ms'
LOG: 'K = 10, Took 0.67 ms'
LOG: 'K = 100, Took 0.59 ms'
LOG: 'K = 1000, Took 0.64 ms' --- slower after K>1000.
LOG: 'K = 10000, Took 1.18 ms'
LOG: 'K = 100000, Took 10.66 ms'
LOG: 'K = 1000000, Took 110.90 ms'

=== NEW ===
LOG: 'K = 1, Took 0.62 ms'
LOG: 'K = 10, Took 0.62 ms'
LOG: 'K = 100, Took 0.67 ms'
LOG: 'K = 1000, Took 0.67 ms' --- faster after K>1000.
LOG: 'K = 10000, Took 1.12 ms'
LOG: 'K = 100000, Took 0.88 ms'
LOG: 'K = 1000000, Took 0.94 ms'

Benchmarked mobilenet, coco-ssd, and posenet -- no perf impact there.

PERF


This change is Reviewable

@dsmilkov
Copy link
Contributor Author

dsmilkov commented Nov 5, 2018

Hold on for a bit, I'm going to generalize this further for any matrix-vector, vector-matrix product based on the excellent observation of @DirkToewe in tensorflow/tfjs#881 (comment)

@dsmilkov
Copy link
Contributor Author

dsmilkov commented Nov 6, 2018

This is ready for review

Copy link
Contributor

@nsthorat nsthorat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 approvals obtained (waiting on @dsmilkov, @nsthorat, and @annxingyuan)


src/ops/matmul_test.ts, line 293 at r2 (raw file):

  });

  it('batched matmul with the matrices being vectors', () => {

test for matrix x vector and vise versa?

Copy link
Collaborator

@annxingyuan annxingyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r2.
Reviewable status: :shipit: complete! 1 of 1 approvals obtained (waiting on @dsmilkov and @annxingyuan)

Copy link
Contributor Author

@dsmilkov dsmilkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 1 approvals obtained


src/ops/matmul_test.ts, line 293 at r2 (raw file):

Previously, nsthorat (Nikhil Thorat) wrote…

test for matrix x vector and vise versa?

Done.

Copy link
Collaborator

@annxingyuan annxingyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! Will be cool to see whether any users of models in the wild report speedups.

@dsmilkov dsmilkov merged commit 5b4e187 into master Nov 6, 2018
@dsmilkov dsmilkov deleted the webgl-matmul branch November 6, 2018 16:34
Copy link
Contributor

@VariableVasasMT VariableVasasMT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 2 of 1 approvals obtained (waiting on @dsmilkov)


src/ops/matmul_test.ts, line 322 at r3 (raw file):

    expectArraysClose(result, [100, 100, 100, 100, 100, 100]);
  });

This change has broken the travis build and needs to be looked into ( @dsmilkov )
This is specifically for safari.

Safari 11.1.2 (Mac OS X 10.13.6) matmul cpu {"HAS_WEBGL":false} batched matmul with matrix x vector FAILED
	Error: Arrays differ: actual[1] = 99.89019775390625, expected[1] = 100.
	Actual:   100.00019836425781,99.89019775390625,99.5501937866211,99.55020141601562,99.3102035522461,99.29019927978516.
	Expected: 100,100,100,100,100,100. in src/test_util.js (line 71)
	expectArraysClose@src/test_util.ts:100:10 <- src/test_util.js:71:28
	src/ops/matmul_test.ts:310:22 <- src/ops/matmul_test.js:219:38
	<Jasmine>

dsmilkov added a commit that referenced this pull request Nov 7, 2018
#1379 added a unit test that failed in travis due to numerical precision: https://travis-ci.org/tensorflow/tfjs-core/jobs/451654174#L801

This PR:
- fixes that test
- Also fixes `test-travis.sh` to make sure we don't forward webgl --> cpu except for the last browser run -- probably caused during conflict resolving of #1371 and #1372 (https://reviewable.io/reviews/tensorflow/tfjs-core/1371#-LQ_a2y92YYLRMiYWqhC)

DEV
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants