[SPARK-1780] Non-existent SPARK_DAEMON_OPTS is lurking around #751

andrewor14 · 2014-05-13T01:03:03Z

What they really mean is SPARK_DAEMON__JAVA__OPTS

AmplabJenkins · 2014-05-13T01:07:57Z

Merged build triggered.

AmplabJenkins · 2014-05-13T01:23:18Z

Merged build started.

AmplabJenkins · 2014-05-13T02:28:51Z

Merged build finished. All automated tests passed.

AmplabJenkins · 2014-05-13T02:28:52Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14922/

What they really mean is SPARK_DAEMON_***JAVA***_OPTS Author: Andrew Or <[email protected]> Closes #751 from andrewor14/spark-daemon-opts and squashes the following commits: 70c41f9 [Andrew Or] SPARK_DAEMON_OPTS -> SPARK_DAEMON_JAVA_OPTS (cherry picked from commit ba96bb3) Signed-off-by: Patrick Wendell <[email protected]>

What they really mean is SPARK_DAEMON_***JAVA***_OPTS Author: Andrew Or <[email protected]> Closes apache#751 from andrewor14/spark-daemon-opts and squashes the following commits: 70c41f9 [Andrew Or] SPARK_DAEMON_OPTS -> SPARK_DAEMON_JAVA_OPTS

### What changes were proposed in this pull request? Added optimizer rule `RemoveRedundantAggregates`. It removes redundant aggregates from a query plan. A redundant aggregate is an aggregate whose only goal is to keep distinct values, while its parent aggregate would ignore duplicate values. The affected part of the query plan for TPCDS q87: Before: ``` == Physical Plan == *(26) HashAggregate(keys=[], functions=[count(1)]) +- Exchange SinglePartition, true, [id=#785] +- *(25) HashAggregate(keys=[], functions=[partial_count(1)]) +- *(25) HashAggregate(keys=[c_last_name#61, c_first_name#60, d_date#26], functions=[]) +- *(25) HashAggregate(keys=[c_last_name#61, c_first_name#60, d_date#26], functions=[]) +- *(25) HashAggregate(keys=[c_last_name#61, c_first_name#60, d_date#26], functions=[]) +- *(25) HashAggregate(keys=[c_last_name#61, c_first_name#60, d_date#26], functions=[]) +- *(25) HashAggregate(keys=[c_last_name#61, c_first_name#60, d_date#26], functions=[]) +- Exchange hashpartitioning(c_last_name#61, c_first_name#60, d_date#26, 5), true, [id=#724] +- *(24) HashAggregate(keys=[c_last_name#61, c_first_name#60, d_date#26], functions=[]) +- SortMergeJoin [coalesce(c_last_name#61, ), isnull(c_last_name#61), coalesce(c_first_name#60, ), isnull(c_first_name#60), coalesce(d_date#26, 0), isnull(d_date#26)], [coalesce(c_last_name#221, ), isnull(c_last_name#221), coalesce(c_first_name#220, ), isnull(c_first_name#220), coalesce(d_date#186, 0), isnull(d_date#186)], LeftAnti :- ... ``` After: ``` == Physical Plan == *(26) HashAggregate(keys=[], functions=[count(1)]) +- Exchange SinglePartition, true, [id=#751] +- *(25) HashAggregate(keys=[], functions=[partial_count(1)]) +- *(25) HashAggregate(keys=[c_last_name#61, c_first_name#60, d_date#26], functions=[]) +- Exchange hashpartitioning(c_last_name#61, c_first_name#60, d_date#26, 5), true, [id=#694] +- *(24) HashAggregate(keys=[c_last_name#61, c_first_name#60, d_date#26], functions=[]) +- SortMergeJoin [coalesce(c_last_name#61, ), isnull(c_last_name#61), coalesce(c_first_name#60, ), isnull(c_first_name#60), coalesce(d_date#26, 0), isnull(d_date#26)], [coalesce(c_last_name#221, ), isnull(c_last_name#221), coalesce(c_first_name#220, ), isnull(c_first_name#220), coalesce(d_date#186, 0), isnull(d_date#186)], LeftAnti :- ... ``` ### Why are the changes needed? Performance improvements - few TPCDS queries have these kinds of duplicate aggregates. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? UT Benchmarks (sf=5): OpenJDK 64-Bit Server VM 1.8.0_265-b01 on Linux 5.8.13-arch1-1 Intel(R) Core(TM) i5-6500 CPU 3.20GHz | Query | Before | After | Speedup | | ------| ------- | ------| ------- | | q14a | 44s | 44s | 1x | | q14b | 41s | 41s | 1x | | q38 | 6.5s | 5.9s | 1.1x | | q87 | 7.2s | 6.8s | 1.1x | | q14a-v2.7 | 55s | 53s | 1x | Closes #30018 from tanelk/SPARK-33122. Lead-authored-by: [email protected] <[email protected]> Co-authored-by: Tanel Kiis <[email protected]> Signed-off-by: Takeshi Yamamuro <[email protected]>

SPARK_DAEMON_OPTS -> SPARK_DAEMON_JAVA_OPTS

70c41f9

asfgit closed this in ba96bb3 May 13, 2014

andrewor14 deleted the spark-daemon-opts branch May 13, 2014 23:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-1780] Non-existent SPARK_DAEMON_OPTS is lurking around #751

[SPARK-1780] Non-existent SPARK_DAEMON_OPTS is lurking around #751

Uh oh!

andrewor14 commented May 13, 2014

Uh oh!

AmplabJenkins commented May 13, 2014

Uh oh!

AmplabJenkins commented May 13, 2014

Uh oh!

AmplabJenkins commented May 13, 2014

Uh oh!

AmplabJenkins commented May 13, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[SPARK-1780] Non-existent SPARK_DAEMON_OPTS is lurking around #751

[SPARK-1780] Non-existent SPARK_DAEMON_OPTS is lurking around #751

Uh oh!

Conversation

andrewor14 commented May 13, 2014

Uh oh!

AmplabJenkins commented May 13, 2014

Uh oh!

AmplabJenkins commented May 13, 2014

Uh oh!

AmplabJenkins commented May 13, 2014

Uh oh!

AmplabJenkins commented May 13, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants