博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
mapreduce 多种输入
阅读量:4604 次
发布时间:2019-06-09

本文共 3301 字,大约阅读时间需要 11 分钟。

1.多路径输入

1)FileInputFormat.addInputPath 多次调用加载不同路径

FileInputFormat.addInputPath(job, new Path("hdfs://RS5-112:9000/cs/path1"));

FileInputFormat.addInputPath(job, new Path("hdfs://RS5-112:9000/cs/path2"));

2)FileInputFormat.addInputPaths一次调用加载 多路径字符串用逗号隔开

FileInputFormat.addInputPaths(job, "hdfs://RS5-112:9000/cs/path1,hdfs://RS5-112:9000/cs/path2");

2.多种输入

MultipleInputs可以加载不同路径的输入文件,并且每个路径可用不同的maper

MultipleInputs.addInputPath(job, new Path("hdfs://RS5-112:9000/cs/path1"), TextInputFormat.class,MultiTypeFileInput1Mapper.class);

MultipleInputs.addInputPath(job, new Path("hdfs://RS5-112:9000/cs/path3"), TextInputFormat.class,MultiTypeFileInput3Mapper.class);

例子:

package example;

import .io.IOException;
import org.apache..conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.MultipleInputs;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
/**
 * 多类型文件输入
 * @author lijl
 *
 */
public class MultiTypeFileInputMR {
static class MultiTypeFileInput1Mapper extends Mapper<LongWritable, Text, Text, Text>{
public void map(LongWritable key,Text value,Context context){
try {
String[] str = value.toString().split("\\|");
context.write(new Text(str[0]), new Text(str[1]));
} catch (IOException e) {
e.printStackTrace();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
static class MultiTypeFileInput3Mapper extends Mapper<LongWritable, Text, Text, Text>{
public void map(LongWritable key,Text value,Context context){
try {
String[] str = value.toString().split("");
context.write(new Text(str[0]), new Text(str[1]));
} catch (IOException e) {
e.printStackTrace();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
static class MultiTypeFileInputReducer extends Reducer<Text, Text, Text, Text>{
public void reduce(Text key,Iterable<Text> values,Context context){
try {
for(Text value:values){
context.write(key,value);
}
} catch (IOException e) {
e.printStackTrace();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException {
Configuration conf = new Configuration();
conf.set("mapred.textoutputformat.separator", ",");
Job job = new Job(conf,"MultiPathFileInput");
job.setJarByClass(MultiTypeFileInputMR.class);
FileOutputFormat.setOutputPath(job, new Path("hdfs://RS5-112:9000/cs/path6"));
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
job.setReducerClass(MultiTypeFileInputReducer.class);
job.setNumReduceTasks(1);
MultipleInputs.addInputPath(job, new Path("hdfs://RS5-112:9000/cs/path1"), TextInputFormat.class,MultiTypeFileInput1Mapper.class);
MultipleInputs.addInputPath(job, new Path("hdfs://RS5-112:9000/cs/path3"), TextInputFormat.class,MultiTypeFileInput3Mapper.class);
System.exit(job.waitForCompletion(true)?0:1);
}
}

转载于:https://www.cnblogs.com/mfryf/p/5455448.html

你可能感兴趣的文章
MySQL建表语句的一些特殊字段
查看>>
《Unix环境高级编程》读书笔记 第8章-进程控制
查看>>
腾讯前端二面题目详解
查看>>
mascara-1
查看>>
Jquery Form表单取值
查看>>
Android API level 与version对应关系
查看>>
Team Name
查看>>
String类
查看>>
西门子_TDC_数据耦合小经验
查看>>
接口测试与postman
查看>>
LINQ To XML的一些方法
查看>>
[LeetCode] Copy List with Random Pointer
查看>>
openstack部署之nova
查看>>
JS组件系列——表格组件神器:bootstrap table
查看>>
存储过程Oracle(一)
查看>>
log4j日志归档
查看>>
Java笔记01——IO流
查看>>
mysql遇见error,1049
查看>>
NYOJ311 完全背包
查看>>
codevs——2822 爱在心中
查看>>