博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
hive报lzo Premature EOF from inputStream错误
阅读量:5235 次
发布时间:2019-06-14

本文共 7343 字,大约阅读时间需要 24 分钟。

今天dw组同事发邮件说有一个问题让帮解决一下。他们自己没能搞得定。下面问题解决过程:

1、hql

insert overwrite table mds_prod_silent_atten_user partition (dt=20141110) select uid, host, atten_time from (select uid, host, atten_time from (select case when t2.uid is null then t1.uid else t2.uid end uid, case when t2.uid is null and t2.host is null then t1.host else t2.host end host, case when t2.atten_time is null or t1.atten_time > t2.atten_time then t1.atten_time else t2.atten_time end atten_time from (select uid, findid(extend,'uids') host, dt atten_time, sum(case when (mode = '1' or mode = '3') then 1 else -1 end) num from ods_bhv_tblog where behavior = '14000076' and dt = '20141115' and (mode = '1' or mode = '3' or mode = '2') and status = '1' group by uid,findid(extend,'uids'),dt) t1 full outer join (select uid, attened_uid host, atten_time from mds_prod_silent_atten_user where dt='20141114') t2 on t1.uid = t2.uid and t1.host = t2.host where t1.uid is null or t1.num > 0) t3 union all select t5.uid, t5.host, t5.atten_time from (select uid, host, atten_time from (select uid, findid(extend,'uids') host, dt atten_time, sum(case when (mode = '1' or mode = '3') then 1 else -1 end) num from ods_bhv_tblog where behavior = '14000076' and dt = '20141115' and (mode = '1' or mode = '3' or mode = '2') and status = '1' group by uid,findid(extend,'uids'),dt) t4 where num = 0) t5 join (select uid, attened_uid host, atten_time from mds_prod_silent_atten_user where dt='20141114') t6 on t6.uid = t5.uid and t6.host = t5.host) t7

以上是详细出错的hql。看着非常复杂,事实上逻辑比較简单,仅仅涉及到两个表的关联:
mds_prod_silent_atten_user和
ods_bhv_tblog。

2、报错日志:

Error: java.io.IOException: java.lang.reflect.InvocationTargetException	at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)	at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)	at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:302)	at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.
(HadoopShimsSecure.java:249) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getRecordReader(HadoopShimsSecure.java:363) at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:591) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.
(MapTask.java:168) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:409) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1550) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:288) ... 11 moreCaused by: java.io.EOFException: Premature EOF from inputStream at com.hadoop.compression.lzo.LzopInputStream.readFully(LzopInputStream.java:75) at com.hadoop.compression.lzo.LzopInputStream.readHeader(LzopInputStream.java:114) at com.hadoop.compression.lzo.LzopInputStream.
(LzopInputStream.java:54) at com.hadoop.compression.lzo.LzopCodec.createInputStream(LzopCodec.java:83) at org.apache.hadoop.hive.ql.io.RCFile$ValueBuffer.
(RCFile.java:667) at org.apache.hadoop.hive.ql.io.RCFile$Reader.
(RCFile.java:1431) at org.apache.hadoop.hive.ql.io.RCFile$Reader.
(RCFile.java:1342) at org.apache.hadoop.hive.ql.io.rcfile.merge.RCFileBlockMergeRecordReader.
(RCFileBlockMergeRecordReader.java:46) at org.apache.hadoop.hive.ql.io.rcfile.merge.RCFileBlockMergeInputFormat.getRecordReader(RCFileBlockMergeInputFormat.java:38) at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.
(CombineHiveRecordReader.java:65) ... 16 more

日志显示,在使用LZO进行压缩时出现Premature EOF from inputStream错误,该错误出如今stage-3

3、stage-3的运行计划信息例如以下:

Stage: Stage-3    Map Reduce      Map Operator Tree:          TableScan            Union              Statistics: Num rows: 365 Data size: 146323 Basic stats: COMPLETE Column stats: NONE              Select Operator                expressions: _col0 (type: string), _col1 (type: string), _col2 (type: string)                outputColumnNames: _col0, _col1, _col2                Statistics: Num rows: 365 Data size: 146323 Basic stats: COMPLETE Column stats: NONE                File Output Operator                  compressed: false                  Statistics: Num rows: 365 Data size: 146323 Basic stats: COMPLETE Column stats: NONE                  table:                      input format: org.apache.hadoop.hive.ql.io.RCFileInputFormat                      output format: org.apache.hadoop.hive.ql.io.RCFileOutputFormat                      serde: org.apache.hadoop.hive.serde2.columnar.LazyBinaryColumnarSerDe                      name: default.mds_prod_silent_atten_user          TableScan            Union              Statistics: Num rows: 365 Data size: 146323 Basic stats: COMPLETE Column stats: NONE              Select Operator                expressions: _col0 (type: string), _col1 (type: string), _col2 (type: string)                outputColumnNames: _col0, _col1, _col2                Statistics: Num rows: 365 Data size: 146323 Basic stats: COMPLETE Column stats: NONE                File Output Operator                  compressed: false                  Statistics: Num rows: 365 Data size: 146323 Basic stats: COMPLETE Column stats: NONE                  table:                      input format: org.apache.hadoop.hive.ql.io.RCFileInputFormat                      output format: org.apache.hadoop.hive.ql.io.RCFileOutputFormat                      serde: org.apache.hadoop.hive.serde2.columnar.LazyBinaryColumnarSerDe                      name: default.mds_prod_silent_atten_user
stage-3仅仅有map。没有reduce,并且map阶段仅仅是简单的进行union,看不错有什么特殊的地方。

4、问题查找

依据lzo Premature EOF from inputStream错误信息google了一把。果然有人遇到过类似的问题,链接:

问题原因:

假设输出格式是TextOutputFormat,要用LzopCodec,对应的读取这个输出的格式是LzoTextInputFormat。

假设输出格式用SequenceFileOutputFormat,要用LzoCodec。对应的读取这个输出的格式是SequenceFileInputFormat。

假设输出使用SequenceFile配上LzopCodec的话。那就等着用SequenceFileInputFormat读取这个输出时收到“java.io.EOFException: Premature EOF from inputStream”吧。

以上链接相应的描写叙述和我们这个问题有类似情况。我们的表输出格式是RCFileOutputFormat,不是普通文本,压缩编码不能用LzopCodec。应该用LzoCodec,而报错信息印证了这一点。在读取上一个job採用LzopCodec压缩生成的rcfile文件时报错。

既然找到了问题的解决办法,那下一步就是找相应的參数,这个參数应该是控制reduce输出压缩编码的參数。将其相应的lzo压缩编码换成LzoCodec,依据出问题job的配置信息:

果然。mapreduce.output.fileoutputformat.compress.codec选项被设置成了LzopCodec。将该选项改动mapreduce.output.fileoutputformat.compress.codec的值即可了,改动成org.apache.hadoop.io.compress.DefaultCodec,默认使用LzoCodec。

转载于:https://www.cnblogs.com/blfshiye/p/5424097.html

你可能感兴趣的文章
Hive教程(1)
查看>>
第16周总结
查看>>
C#编程时应注意的性能处理
查看>>
Fragment
查看>>
比较安全的获取站点更目录
查看>>
苹果开发者账号那些事儿(二)
查看>>
使用C#交互快速生成代码!
查看>>
UVA11374 Airport Express
查看>>
P1373 小a和uim之大逃离 四维dp,维护差值
查看>>
NOIP2015 运输计划 树上差分+树剖
查看>>
P3950 部落冲突 树链剖分
查看>>
读书_2019年
查看>>
读书汇总贴
查看>>
微信小程序 movable-view组件应用:可拖动悬浮框_返回首页
查看>>
MPT树详解
查看>>
空间分析开源库GEOS
查看>>
RQNOJ八月赛
查看>>
前端各种mate积累
查看>>
jQuery 1.7 发布了
查看>>
Python(软件目录结构规范)
查看>>