相关动态
hive查询where join_hive left join on 条件与where条件效率优化
2025-01-02 17:07

参考网址:http://shiyanjun.cn/archives/588.html

在hive执行sql过程中,一条sql执行了一个多小时,检查sql发现,在left

join之后,on只使用了关联字段,而其他筛选条件都在where之中使用。经过修改后,将where字段放在on

条件中进行判断,执行效率提高到半小时之内结束。在on条件中使用条件,可以发现筛选分区后再进行关联操作,splits只有12个

INFO  : number of splits:12

,而where中使用条件,则是关联之后再进行分区筛选,分片数达到了94个splits,而且需要执行94个map,412个reduce,消耗的资源是非常巨大的。

INFO : number of splits:94​

原始hql

优化后hql:​

查看执行计划发现,on中使用分区字段过滤,则执行计划中在scan表的时候就已经过滤掉分区了,而如果在where条件中过滤,则不会在scan中过滤,输入数据量巨大

------------------------------------执行过程日志如下

INFO  : Query ID =

hive_20171026031919_9b1b59f2-b867-466a-95a5-06b5be6daafa

INFO  : Total jobs = 5

INFO  : Starting task [Stage-12:MAPREDLOCAL] in

serial mode

INFO  : Execution completed successfully

INFO  : MapredLocal task succeeded

INFO  : Launching Job 1 out of 5

INFO  : Starting task [Stage-10:MAPRED] in

serial mode

INFO  : Number of reduce tasks is set to 0

since there's no reduce operator

INFO  : number of splits:12

INFO  : Submitting tokens for job:

job_1508725411753_0155

INFO  : Kind: HDFS_DELEGATION_TOKEN, Service:

10.1.2.94:8020, Ident: (HDFS_DELEGATION_TOKEN token 999 for

hive)

INFO  : The url to track the job:

http://host:8088/proxy/application_1508725411753_0155/

INFO  : Starting Job = job_1508725411753_0155,

Tracking URL =

http://host:8088/proxy/application_1508725411753_0155/

INFO  : Kill Command =

/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/hadoop/bin/hadoop

job  -kill job_1508725411753_0155

INFO  : Hadoop job information for Stage-10:

number of mappers: 12; number of reducers: 0

.........​

INFO  : 2017-10-26 03:21:44,169

Stage-10 map = 100%,  reduce = 0%, Cumulative CPU

176.25 sec

INFO  : MapReduce Total cumulative CPU time: 2

minutes 56 seconds 250 msec

INFO  : Ended Job = job_1508725411753_0155

INFO  : Starting task [Stage-9:CONDITIONAL] in

serial mode

INFO  : Stage-11 is filtered out by condition

resolver.

INFO  : Stage-2 is selected by condition

resolver.

INFO  : Launching Job 2 out of 5

INFO  : Starting task [Stage-2:MAPRED] in

serial mode

INFO  : Number of reduce tasks not specified.

Estimated from input data size: 12

INFO  : In order to change the average load for

a reducer (in bytes):

INFO  :   set

hive.exec.reducers.bytes.per.reducer=

INFO  : In order to limit the maximum number of

reducers:

INFO  :   set

hive.exec.reducers.max=

INFO  : In order to set a constant number of

reducers:

INFO  :   set

mapreduce.job.reduces=

INFO  : number of splits:4

INFO  : Submitting tokens for job:

job_1508725411753_0156

INFO  : Kind: HDFS_DELEGATION_TOKEN, Service:

10.1.2.94:8020, Ident: (HDFS_DELEGATION_TOKEN token 1000 for

hive)

INFO  : The url to track the job:

http://host:8088/proxy/application_1508725411753_0156/

INFO  : Starting Job = job_1508725411753_0156,

Tracking URL =

http://host:8088/proxy/application_1508725411753_0156/

INFO  : Kill Command =

/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/hadoop/bin/hadoop

job  -kill job_1508725411753_0156

INFO  : Hadoop job information for Stage-2:

number of mappers: 4; number of reducers: 12

INFO  : 2017-10-26 03:21:56,128 Stage-2 map =

0%,  reduce = 0%

.........​

INFO  : 2017-10-26 03:27:43,452

Stage-2 map = 100%,  reduce = 100%, Cumulative CPU

981.83 sec

INFO  : MapReduce Total cumulative CPU time: 16

minutes 21 seconds 830 msec

INFO  : Ended Job = job_1508725411753_0156

INFO  : Launching Job 3 out of 5

INFO  : Starting task [Stage-3:MAPRED] in

serial mode

INFO  : Number of reduce tasks not specified.

Estimated from input data size: 1

INFO  : In order to change the average load for

a reducer (in bytes):

INFO  :   set

hive.exec.reducers.bytes.per.reducer=

INFO  : In order to limit the maximum number of

reducers:

INFO  :   set

hive.exec.reducers.max=

INFO  : In order to set a constant number of

reducers:

INFO  :   set

mapreduce.job.reduces=

INFO  : number of splits:1

INFO  : Submitting tokens for job:

job_1508725411753_0158

INFO  : Kind: HDFS_DELEGATION_TOKEN, Service:

10.1.2.94:8020, Ident: (HDFS_DELEGATION_TOKEN token 1002 for

hive)

INFO  : The url to track the job:

http://host:8088/proxy/application_1508725411753_0158/

INFO  : Starting Job = job_1508725411753_0158,

Tracking URL =

http://host:8088/proxy/application_1508725411753_0158/

INFO  : Kill Command =

/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/hadoop/bin/hadoop

job  -kill job_1508725411753_0158

INFO  : Hadoop job information for Stage-3:

number of mappers: 1; number of reducers: 1

INFO  : 2017-10-26 03:28:00,533 Stage-3 map =

0%,  reduce = 0%

INFO  : 2017-10-26 03:28:13,143 Stage-3 map =

100%,  reduce = 0%, Cumulative CPU 6.57 sec

INFO  : 2017-10-26 03:28:28,700 Stage-3 map =

100%,  reduce = 100%, Cumulative CPU 15.46 sec

INFO  : MapReduce Total cumulative CPU time: 15

seconds 460 msec

INFO  : Ended Job = job_1508725411753_0158

INFO  : Launching Job 4 out of 5

INFO  : Starting task [Stage-4:MAPRED] in

serial mode

INFO  : Number of reduce tasks determined at

compile time: 1

INFO  : In order to change the average load for

a reducer (in bytes):

INFO  :   set

hive.exec.reducers.bytes.per.reducer=

INFO  : In order to limit the maximum number of

reducers:

INFO  :   set

hive.exec.reducers.max=

INFO  : In order to set a constant number of

reducers:

INFO  :   set

mapreduce.job.reduces=

INFO  : number of splits:1

INFO  : Submitting tokens for job:

job_1508725411753_0159

INFO  : Kind: HDFS_DELEGATION_TOKEN, Service:

10.1.2.94:8020, Ident: (HDFS_DELEGATION_TOKEN token 1004 for

hive)

INFO  : The url to track the job:

http://host:8088/proxy/application_1508725411753_0159/

INFO  : Starting Job = job_1508725411753_0159,

Tracking URL =

http://host:8088/proxy/application_1508725411753_0159/

INFO  : Kill Command =

/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/hadoop/bin/hadoop

job  -kill job_1508725411753_0159

INFO  : Hadoop job information for Stage-4:

number of mappers: 1; number of reducers: 1

INFO  : 2017-10-26 03:28:42,569 Stage-4 map =

0%,  reduce = 0%

INFO  : 2017-10-26 03:28:56,012 Stage-4 map =

100%,  reduce = 0%, Cumulative CPU 7.64 sec

INFO  : 2017-10-26 03:29:09,498 Stage-4 map =

100%,  reduce = 100%, Cumulative CPU 13.13 sec

INFO  : MapReduce Total cumulative CPU time: 13

seconds 130 msec

INFO  : Ended Job = job_1508725411753_0159

INFO  : Starting task [Stage-0:MOVE] in serial

mode

INFO  : Loading data to table otl.oel_test from

hdfs://test94.eformax.com:8020/user/hive/warehouse/otl.db/oel_test/.hive-staging_hive_2017-10-26_03-19-40_941_4069260026444954348-5/-ext-10000

INFO  : Starting task [Stage-5:STATS] in serial

mode

INFO  : Table otl.oel_test stats: [numFiles=1,

numRows=188992, totalSize=15418476, rawDataSize=15229484]

INFO  : MapReduce Jobs

Launched:

INFO  : Stage-Stage-10: Map:

12   Cumulative CPU: 176.25

sec   HDFS Read: 3308304990 HDFS

Write: 500780832 SUCCESS

INFO  : Stage-Stage-2: Map: 4

Reduce: 12   Cumulative CPU:

981.83 sec   HDFS Read: 802041177

HDFS Write: 15495161 SUCCESS

INFO  : Stage-Stage-3: Map: 1

Reduce: 1   Cumulative CPU: 15.46

sec   HDFS Read: 15527532 HDFS

Write: 17183487 SUCCESS

INFO  : Stage-Stage-4: Map: 1

Reduce: 1   Cumulative CPU: 13.13

sec   HDFS Read: 17193170 HDFS

Write: 15418554 SUCCESS

INFO  : Total MapReduce CPU Time Spent: 19

minutes 46 seconds 670 msec

INFO  : Completed executing

command(queryId=hive_20171026031919_9b1b59f2-b867-466a-95a5-06b5be6daafa);

Time taken: 565.401 seconds

INFO  : OK

    以上就是本篇文章【hive查询where join_hive left join on 条件与where条件效率优化】的全部内容了,欢迎阅览 ! 文章地址:http://sjzytwl.xhstdz.com/news/12878.html 
     栏目首页      相关文章      动态      同类文章      热门文章      网站地图      返回首页 物流园资讯移动站 http://sjzytwl.xhstdz.com/mobile/ , 查看更多   
最新文章
苹果手机怎么恢复备份?详细攻略为你整理好了!云备份怎么恢复到手机「苹果手机怎么恢复备份?详细攻略为你整理好了!」
随着智能手机和互联网的普及,手机中存储的个人信息、照片、视频、聊天记录等数据会变得越来越多。一旦手机丢失、损坏或系统出现
李亚鹏正式息影 投资50亿丽江当老板金立手机老板「李亚鹏正式息影 投资50亿丽江当老板」
  华西都市报4月28日讯 昨日,李亚鹏打造的“云南省文化产业试验园”项目北京签约“我每年只选一部戏来拍,这已经严格遵循了10
怎样采用比特币挖矿?手机挖矿「怎样采用比特币挖矿?」
好吧,废话不多说了,其实比特币 - Bitcoin 出现时间已经很久了,不过对于新手来说,还是需要讲解一下的。关于比特币。FORECE 在
定了!苹果 2022 秋季发布会定档,9月8日见iPhone 14 来了?苹果手机发布会「定了!苹果 2022 秋季发布会定档,9月8日见iPhone 14 来了?」
又是九月,丰收的季节,今年的苹果又该熟了.......正如此前传闻那样,今晚苹果正式向外界发送了邀请函,宣布将于当地时间 9月7日
苹果手机微信闪退怎么回事苹果手机闪退是什么原因「苹果手机微信闪退怎么回事」
  品牌型号:iPhone12  系统版本:IOS14  苹果手机如果出现微信闪退的状况,首先我们得寻找原因,接下来小编就带着你们来
ZArchiver手机解压缩工具chm文件手机怎么打开「ZArchiver手机解压缩工具」
ZArchiver是一款功能强大的解压缩工具APP,可用于解压文件或压缩文件,软件支持多种文件格式的压缩和解压缩,包括ZIP、RAR、7Z、
【原】一部手机最长能用几年?主要有三个因素,决定了手机的寿命什么手机寿命最长「【原】一部手机最长能用几年?主要有三个因素,决定了手机的寿命」
一部手机最长能用几年?主要取决于以下这几个重要因素,才能决定手机的使用寿命。在当今社会,智能手机已经成为我们生活中不可或
iQOO 3评测:旗舰机还能在哪提升?这部手机给了答案机械手机「iQOO 3评测:旗舰机还能在哪提升?这部手机给了答案」
  文/晓光 于泽 瑞豪 视频/贾乾 图/苏航  从去年成立至今,iQOO这品牌发布了不到十款产品,并成功在互联网渠道占据了自己的
2013-4-7发布华为模拟器eNSP月度更新版(V2.0)支持全系列AR路由器模拟华为手机模拟器「2013-4-7发布华为模拟器eNSP月度更新版(V2.0)支持全系列AR路由器模拟」
版权声明:本文为博主原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。本文链接:https://blog.csdn.ne
ADB驱动源码深度解析,从原理探究到实践应用android手机驱动「ADB驱动源码深度解析,从原理探究到实践应用」
摘要:,,本文深入解析ADB(Android Debug Bridge)驱动源码,从原理到实践全面阐述。文章首先介绍了ADB的基本原理和主要功能,
相关文章