学习记录-docker配置与运行hadoop时出现的问题和错误

原创已于 2022-05-22 14:45:56 修改 · 1.1k 阅读

2 ·

本内容遵循CC 4.0 BY-SA版权协议

收录于

其他

于 2022-05-20 23:43:37 首次发布

文章目录

- 遇到的问题及解决方法
- 补充

Docker 版本 19.03.13
hadoop 版本 2.7.0

用docker搭建hadoop伪分布式集群
参考文章：Docker安装分布式Hadoop

遇到的问题及解决方法

Hadoop配置后没有NameNode进程的解决方法
Hadoop配置后没有NameNode进程怎么回事？
hadoop启动后jps查不到namenode的解决办法

缺少DataNode进程的解决办法
hadoop——缺少DataNode的解决办法
 对于hadoop守护进程缺少SecondaryNameNode或Datanode的一个解决办法
 hadoop—jps执行后缺少DataNode的解决办法

为何格式化hdfs，tmp/data文件夹下没有current
解决方法：查看hdfs-site.xml文件是否配置错误

我的问题是，hdfs-site.xml文件多加了错误配置“dfs.datanode.name.dir”，如下：

<property>
	<name>dfs.datanode.name.dir</name>
	<value>file:///usr/local/hadoop-2.7.0/hadoop_tmp/dfs/name</value>
</property>

导致格式化hdfs，tmp/data文件夹下没有current。这样datanode进程就无法启动

在使用hadoop Java API 管理文件时，报错

 WARN - I/O error constructing remote block reader.
 java.net.ConnectException: Connection timed out: no further information
......
 WARN - Failed to connect to /172.18.0.3:50010 for block, add to deadNodes and continue. java.net.ConnectException: Connection timed out: no further information
 java.net.ConnectException: Connection timed out: no further information
 ......
 Exception in thread "main" org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-804584029-172.18.0.3-1652929775016:blk_1073742202_1378 file=/user/hadooptest/tmp/test001.txt

解决方法：修改本地和服务器的 hdfs-site.xml 文件，添加如下配置：

    <property>
        <name>dfs.client.use.datanode.hostname</name>
        <value>true</value>
    </property>

再在本地hosts文件添加 x.x.x.x（你自己的x.x.x.x:50010地址）的主机映射
具体请参考：https://stackoverflow.com/questions/45276427/unable-connect-to-docker-container-outside-docker-host#

在本地调试hadoop时，报错org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z
解决org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z

补充

MapReduce详解
 HDFS文件目录结构详解

idea新工具Big Data Tools安装
 IDEA中Plugins搜索不到任何插件解决办法，网上各方法大全（总有一种适合你）
IDEA 2019.3 下载插件 Plugins Nothing found

IntelliJ IDEA连接Hadoop集群
 hadoop集群通过web管理界面只显示一个节点，datanode只启动一个

IDEA远程连接Hadoop集群与对hdfs的API操作

基础学习：
hadoop之mapreduce完成对用户访问网站日期统计
 Hadoop大数据开发基础系列：四、MapReduce初级编程

进阶学习过程中，参考该文章写过作业Hadoop大数据开发基础系列：五、MapReduce进阶编程，发现里面的第三点3.优化日志文件统计程序始终执行失败，后面发现是类型不匹配问题。

解决方法：
将：各个<MemberLogTime, IntWritable,MemberLogTime,IntWritable>
修改为<LongWritable, Text,MemberLogTime,IntWritable>
最终改为如下所示：

Reducer <LongWritable, Text,MemberLogTime,IntWritable>
...
Partitioner<LongWritable, Text>
...
Mapper<LongWritable,Text,MemberLogTime, IntWritable>
...
Reducer<LongWritable, Text,MemberLogTime,IntWritable>
...
job.setMapperClass(SelectLoginCountMapper.class);//设置自定义Mapper类
job.setMapOutputKeyClass(LongWritable.class);
job.setMapOutputValueClass(Text.class);
job.setReducerClass(SelectLoginCountReducer.class);//设置自定义Reducer类
job.setOutputKeyClass(LongWritable.class);
job.setOutputValueClass(Text.class);