-
Notifications
You must be signed in to change notification settings - Fork 943
Add packed arithmetic binary operation kernel. (#951) #1423
Conversation
Hey @astojilj - thank you so much for this contribution! Let me know when this is ready for review - currently the tests are failing (related to the new |
This is a fix to PR tensorflow#1423 issue - webgl testing environment is not runnable from yarn node-test.
This is a fix to PR tensorflow#1423 issue - webgl testing environment is not runnable from yarn node-test.
@annxingyuan, PTAL.
Thanks. I learned from those. In this case, I wanted to run existing tests, both with non-packed and packed operation and didn't write new tests for packed arithmetic ops. Code doing that is in 'arithmetic_test.ts'. |
Hey @astojilj, before I review this fully, would you mind running MobileNet v2 on an image and confirming that the logits are identical with your WEBGL_PACKED_BINARYOP flag on and off? I tried dropping your change into an internal model and the outputs seem to be different. Thank you! |
This is a fix to PR tensorflow#1423 issue - webgl testing environment is not runnable from yarn node-test.
After rebase to the latest master, also the tests started to fail. Fixed that (Updated packed sampling to the same approach as you did with batchnorm and clip. Thanks, much simpler now.) and the tests pass, locally. |
@annxingyuan PTAL. This should be ready now for review and verified with Deeplab V3+ MobileNet2 model.
I will move following content to bug report once I find the way to reproduce it in tests: Works fine with this in code:
Doesn't work (no errors in console but segmentation doesn't work):
Is there a demo code you're using to run it on image? |
This is a fix to PR tensorflow#1423 issue - webgl testing environment is not runnable from yarn node-test.
FEATURE - Doesn't support 5-D and 6-D. - Supports add(addN), sub, multiply, realDivide, floorDiv and pow. - Broadcasting is supported only when one of the operands is scalar or 1-D. - Added behind environmental flag 'WEBGL_PACK_BINARY_OPERATIONS'.
This is a fix to PR tensorflow#1423 issue - webgl testing environment is not runnable from yarn node-test.
Ensure the tests asuming 'add' is unpacked, run unpacked op. Minor update to the test to make it future proof. With or without this patch, there is still an issue with packed batchnorm on DeepLab3+ MobileNetV2 segmentation. I'm trying to reproduce it in unit tests... This one can be reviewed - there is an issue with Karma setup causing CI to fail today. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @astojilj thank you for this amazing contribution - just added some comments!
A few more things:
- Thanks for pointing out the batchNorm issue. I'm not sure whether this is related, but #1434 fixed a bug with packed broadcasting. Would you mind testing DeepLab3+ MobileNetV2 again against master to see whether it works now?
- I like the way you set up the tests as it increases coverage, and it's probably how we should be testing packed operations moving forward. Let me check with my team though before approving an infrastructural change like this.
- Could you run some benchmarks on WebGL 2 and 1 (ideally on an iPhone) with packing flags turned on? With your PR there should be a substantial swath of MobileNet2 that never leaves the packed path and I'd love to know what kinds of speedups we might expect.
By the way, we have a standalone benchmarking page here: https://github.com/tensorflow/tfjs-core/blob/master/integration_tests/benchmarks/benchmark.html |
I need to update this. It is probably better if PR #1448 is reviewed first as:
I'd like to update this patch to the same approach there: activateWebGLPackedTestEnv.
I plan to check with this and PR #1448, how many expensive reshapes are happening, then test on iPhone with this and different packaging scheme (Issue 949). |
Do you have a hypothesis on the higher GPU memory allocation issue? What do you mean you noticed an expensive reshape on every frame running on const node filter in conv2d? I'm not sure I understand. |
Let me cover this offline. |
@annxingyuan I've setup the benchmarks for all-packed-operations pipeline. Please check the results and there you'll also find a link to run it. Code includes non-minified version of tfjs-core if you'd like to debug. Additionally, you might want to always return true in isReshapeFree - that modification renders the output invalid but it shows about 5% improvement when using GL_FLOAT. Otherwise, half float pipeline is much faster when build with only packed ops. On a related note, opened Issue 1047 Needless to say - if additional measurements are required, e.g. using batch normalization (I have replaced them all by graph transform tool with bias add operations) or separate measurements for packed arithmetics vs batch<->space, just ask. I will work on splitting #1448 as requested. How to proceed with this PR? |
Thank you so much for these benchmarks - you bring up a very interesting suggestion about enforcing half floats... I need to look at your results more closely, but let's first of all try to get this PR merged! The only remaining issue is with testing. I like the way you set the tests up but I don't want to introduce this infrastructural change quite yet... would you mind writing a small set of packed arithmetic tests? Before we merge your other PR (#1448) I will try to add a global packed testing environment so we don't have to continue replicating tests. You don't have to match the coverage on our existing arithmetic tests - could you just add 6 unit tests to
It doesn't matter to me which ops you test for the different scenarios - as long as each of your new ops gets hit. I realize this means much less coverage, but it's only temporary! |
Oh wait I'm realizing that this does not support broadcasting - so maybe the unit test scenarios I listed are not the right ones to target - my point was just that you can pick a few scenarios to test! |
@@ -52,7 +52,7 @@ export const POW = ` | |||
if(a < 0.0 && floor(b) < b){ | |||
return NAN; | |||
} | |||
return (round(mod(b, 2.0)) == 0 || round(mod(b, 2.0)) == 2) ? | |||
return (round(mod(b, 2.0)) != 1) ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@annxingyuan This change is outside the scope of the patch as it affects non packed ops. If needed I could take it out.
@annxingyuan, since we'll get packed environment in TEST_ENVS, I've opened Issue 1050 to cover it, copied and modified original arithmetic_test.js to arithmetic_packed_test.js to set WEBGL_PACK_BINARY_OPERATIONS before running tests, and added TODO to remove the arithmetic_packed_test.js when work on Issue 1050 is done. |
Awesome - really glad to have this in the library - thank you for your patience with this PR! I will try to get tensorflow/tfjs#1050 done ASAP. |
FEATURE
Description
packed binary operations.
For repository owners only:
Please remember to apply all applicable tags to your pull request.
Tags: FEATURE, BREAKING, BUG, PERF, DEV, DOC, SECURITY
For more info see: https://github.com/tensorflow/tfjs/blob/master/DEVELOPMENT.md
This change is