hadoop - More than One Reducer and One Output File -
in hadoop code, have 4 reducers , have 4 output files quite normal each reducer puts result in 1 file. question here: how can have 1 , 1 output file?
the problem have iterative mapreduce job takes input file, divides chuncks , gives each chunck mapper, that's why have gather reducers results , put them in 1 output file in order divide output file in equivilant way 4 parts, each part given 1 mapper , on.
can try multipleoutputs
, can specify output file each reducer should write. example in reducer code:
... public void setup(context context) { out = new multipleoutputs<yourkey,yourvalue>(context); } public void reduce(yourkey key, iterable<yourvalue> values, context context) throws ioexception, interruptedexception { ....... //instead of writing using context, use multipleoutput here //context.write(key, your-result); out.write(key, your-result,"path/filename"); } public void cleanup(context context) throws ioexception,interruptedexception { out.close(); } .....
for case need ensure job configuration also.
...... job.setoutputformatclass(nulloutputformat.class); lazyoutputformat.setoutputformatclass(job, fileoutputformat.class); fileoutputformat.setoutputpath(job, new path("output")); ......
in case eachreducer out put written output/path/filename
Comments
Post a Comment