推广 热搜： 行业设备金系统参数经纪杯教师机械中国

hive查询where join_hive left join on 条件与where条件效率优化

日期：2025-01-02 作者：b4n9j caijiyuan 评论：0 移动：http://mip.xhstdz.com/news/12878.html

核心提示：参考网址：http://shiyanjun.cn/archives/588.html在hive执行sql过程中，一条sql执行了一个多小时，检查sql发

参考网址：http://shiyanjun.cn/archives/588.html

在hive执行sql过程中，一条sql执行了一个多小时，检查sql发现，在left

join之后，on只使用了关联字段，而其他筛选条件都在where之中使用。经过修改后，将where字段放在on

条件中进行判断，执行效率提高到半小时之内结束。在on条件中使用条件，可以发现筛选分区后再进行关联操作，splits只有12个

INFO : number of splits:12

，而where中使用条件，则是关联之后再进行分区筛选，分片数达到了94个splits，而且需要执行94个map，412个reduce，消耗的资源是非常巨大的。

INFO : number of splits:94

原始hql：

优化后hql:

查看执行计划发现，on中使用分区字段过滤，则执行计划中在scan表的时候就已经过滤掉分区了，而如果在where条件中过滤，则不会在scan中过滤，输入数据量巨大！

------------------------------------执行过程日志如下：

INFO : Query ID =

hive_20171026031919_9b1b59f2-b867-466a-95a5-06b5be6daafa

INFO : Total jobs = 5

INFO : Starting task [Stage-12:MAPREDLOCAL] in

serial mode

INFO : Execution completed successfully

INFO : MapredLocal task succeeded

INFO : Launching Job 1 out of 5

INFO : Starting task [Stage-10:MAPRED] in

serial mode

INFO : Number of reduce tasks is set to 0

since there's no reduce operator

INFO : number of splits:12

INFO : Submitting tokens for job:

job_1508725411753_0155

INFO : Kind: HDFS_DELEGATION_TOKEN, Service:

10.1.2.94:8020, Ident: (HDFS_DELEGATION_TOKEN token 999 for

hive)

INFO : The url to track the job:

http://host:8088/proxy/application_1508725411753_0155/

INFO : Starting Job = job_1508725411753_0155,

Tracking URL =

http://host:8088/proxy/application_1508725411753_0155/

INFO : Kill Command =

/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/hadoop/bin/hadoop

job -kill job_1508725411753_0155

INFO : Hadoop job information for Stage-10:

number of mappers: 12; number of reducers: 0

.........

INFO : 2017-10-26 03:21:44,169

Stage-10 map = 100%, reduce = 0%, Cumulative CPU

176.25 sec

INFO : MapReduce Total cumulative CPU time: 2

minutes 56 seconds 250 msec

INFO : Ended Job = job_1508725411753_0155

INFO : Starting task [Stage-9:CONDITIONAL] in

serial mode

INFO : Stage-11 is filtered out by condition

resolver.

INFO : Stage-2 is selected by condition

resolver.

INFO : Launching Job 2 out of 5

INFO : Starting task [Stage-2:MAPRED] in

serial mode

INFO : Number of reduce tasks not specified.

Estimated from input data size: 12

INFO : In order to change the average load for

a reducer (in bytes):

INFO : set

hive.exec.reducers.bytes.per.reducer=

INFO : In order to limit the maximum number of

reducers:

INFO : set

hive.exec.reducers.max=

INFO : In order to set a constant number of

reducers:

INFO : set

mapreduce.job.reduces=

INFO : number of splits:4

INFO : Submitting tokens for job:

job_1508725411753_0156

INFO : Kind: HDFS_DELEGATION_TOKEN, Service:

10.1.2.94:8020, Ident: (HDFS_DELEGATION_TOKEN token 1000 for

hive)

INFO : The url to track the job:

http://host:8088/proxy/application_1508725411753_0156/

INFO : Starting Job = job_1508725411753_0156,

Tracking URL =

http://host:8088/proxy/application_1508725411753_0156/

INFO : Kill Command =

/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/hadoop/bin/hadoop

job -kill job_1508725411753_0156

INFO : Hadoop job information for Stage-2:

number of mappers: 4; number of reducers: 12

INFO : 2017-10-26 03:21:56,128 Stage-2 map =

0%, reduce = 0%

.........

INFO : 2017-10-26 03:27:43,452

Stage-2 map = 100%, reduce = 100%, Cumulative CPU

981.83 sec

INFO : MapReduce Total cumulative CPU time: 16

minutes 21 seconds 830 msec

INFO : Ended Job = job_1508725411753_0156

INFO : Launching Job 3 out of 5

INFO : Starting task [Stage-3:MAPRED] in

serial mode

INFO : Number of reduce tasks not specified.

Estimated from input data size: 1

INFO : In order to change the average load for

a reducer (in bytes):

INFO : set

hive.exec.reducers.bytes.per.reducer=

INFO : In order to limit the maximum number of

reducers:

INFO : set

hive.exec.reducers.max=

INFO : In order to set a constant number of

reducers:

INFO : set

mapreduce.job.reduces=

INFO : number of splits:1

INFO : Submitting tokens for job:

job_1508725411753_0158

INFO : Kind: HDFS_DELEGATION_TOKEN, Service:

10.1.2.94:8020, Ident: (HDFS_DELEGATION_TOKEN token 1002 for

hive)

INFO : The url to track the job:

http://host:8088/proxy/application_1508725411753_0158/

INFO : Starting Job = job_1508725411753_0158,

Tracking URL =

http://host:8088/proxy/application_1508725411753_0158/

INFO : Kill Command =

/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/hadoop/bin/hadoop

job -kill job_1508725411753_0158

INFO : Hadoop job information for Stage-3:

number of mappers: 1; number of reducers: 1

INFO : 2017-10-26 03:28:00,533 Stage-3 map =

0%, reduce = 0%

INFO : 2017-10-26 03:28:13,143 Stage-3 map =

100%, reduce = 0%, Cumulative CPU 6.57 sec

INFO : 2017-10-26 03:28:28,700 Stage-3 map =

100%, reduce = 100%, Cumulative CPU 15.46 sec

INFO : MapReduce Total cumulative CPU time: 15

seconds 460 msec

INFO : Ended Job = job_1508725411753_0158

INFO : Launching Job 4 out of 5

INFO : Starting task [Stage-4:MAPRED] in

serial mode

INFO : Number of reduce tasks determined at

compile time: 1

INFO : In order to change the average load for

a reducer (in bytes):

INFO : set

hive.exec.reducers.bytes.per.reducer=

INFO : In order to limit the maximum number of

reducers:

INFO : set

hive.exec.reducers.max=

INFO : In order to set a constant number of

reducers:

INFO : set

mapreduce.job.reduces=

INFO : number of splits:1

INFO : Submitting tokens for job:

job_1508725411753_0159

INFO : Kind: HDFS_DELEGATION_TOKEN, Service:

10.1.2.94:8020, Ident: (HDFS_DELEGATION_TOKEN token 1004 for

hive)

INFO : The url to track the job:

http://host:8088/proxy/application_1508725411753_0159/

INFO : Starting Job = job_1508725411753_0159,

Tracking URL =

http://host:8088/proxy/application_1508725411753_0159/

INFO : Kill Command =

/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/hadoop/bin/hadoop

job -kill job_1508725411753_0159

INFO : Hadoop job information for Stage-4:

number of mappers: 1; number of reducers: 1

INFO : 2017-10-26 03:28:42,569 Stage-4 map =

0%, reduce = 0%

INFO : 2017-10-26 03:28:56,012 Stage-4 map =

100%, reduce = 0%, Cumulative CPU 7.64 sec

INFO : 2017-10-26 03:29:09,498 Stage-4 map =

100%, reduce = 100%, Cumulative CPU 13.13 sec

INFO : MapReduce Total cumulative CPU time: 13

seconds 130 msec

INFO : Ended Job = job_1508725411753_0159

INFO : Starting task [Stage-0:MOVE] in serial

mode

INFO : Loading data to table otl.oel_test from

hdfs://test94.eformax.com:8020/user/hive/warehouse/otl.db/oel_test/.hive-staging_hive_2017-10-26_03-19-40_941_4069260026444954348-5/-ext-10000

INFO : Starting task [Stage-5:STATS] in serial

mode

INFO : Table otl.oel_test stats: [numFiles=1,

numRows=188992, totalSize=15418476, rawDataSize=15229484]

INFO : MapReduce Jobs

Launched:

INFO : Stage-Stage-10: Map:

12 Cumulative CPU: 176.25

sec HDFS Read: 3308304990 HDFS

Write: 500780832 SUCCESS

INFO : Stage-Stage-2: Map: 4

Reduce: 12 Cumulative CPU:

981.83 sec HDFS Read: 802041177

HDFS Write: 15495161 SUCCESS

INFO : Stage-Stage-3: Map: 1

Reduce: 1 Cumulative CPU: 15.46

sec HDFS Read: 15527532 HDFS

Write: 17183487 SUCCESS

INFO : Stage-Stage-4: Map: 1

Reduce: 1 Cumulative CPU: 13.13

sec HDFS Read: 17193170 HDFS

Write: 15418554 SUCCESS

INFO : Total MapReduce CPU Time Spent: 19

minutes 46 seconds 670 msec

INFO : Completed executing

command(queryId=hive_20171026031919_9b1b59f2-b867-466a-95a5-06b5be6daafa);

Time taken: 565.401 seconds

INFO : OK

本文地址：http://sjzytwl.xhstdz.com/news/12878.html 物流园资讯网 http://sjzytwl.xhstdz.com/ , 查看更多

特别提示：本信息由相关用户自行提供，真实性未证实，仅供参考。请谨慎采用，风险自负。

更多>同类最新文章

0 条相关评论

文章列表

相关文章

李开复：DeepSeek 的成功证明闭源是“死路”，开源才有更大发展

最新动态

推荐图文

最新文章

点击排行

• 电信卡怎么异地销户销户流程是什么手机卡停机	• 梦金园去年净利同比降17.8%至1.89亿元海外增7
• 怎么找回qq密码qq密码忘记了手机号也换了怎么办	• 大屏智能千元机酷派5860价格欲破1100元酷派586
• 计步器怎么删掉手机自动计步器「计步器怎么删掉	• 三星今天上午宣布Note7爆炸原因已导致损失超10
• CBA八强出炉！下半区比上半区惨烈，9-12最终排	• 财知道｜爆卖上百万件！巴掌大的小东西，是年
• Z趋势：4G高速时代机会与挑战并存4g手机价格「	• 华为手机工程模式怎么进入(华为手机工程模式怎