hadoop - More than One Reducer and One Output File -

June 15, 2010

in hadoop code, have 4 reducers , have 4 output files quite normal each reducer puts result in 1 file. question here: how can have 1 , 1 output file?

the problem have iterative mapreduce job takes input file, divides chuncks , gives each chunck mapper, that's why have gather reducers results , put them in 1 output file in order divide output file in equivilant way 4 parts, each part given 1 mapper , on.

can try multipleoutputs, can specify output file each reducer should write. example in reducer code:

   ...    public void setup(context context) {        out = new multipleoutputs<yourkey,yourvalue>(context);           }     public void reduce(yourkey key, iterable<yourvalue> values, context context)             throws ioexception, interruptedexception {              .......         //instead of writing using context, use multipleoutput here         //context.write(key, your-result);         out.write(key, your-result,"path/filename");     }     public void cleanup(context context) throws ioexception,interruptedexception {         out.close();              }     .....

for case need ensure job configuration also.

...... job.setoutputformatclass(nulloutputformat.class); lazyoutputformat.setoutputformatclass(job, fileoutputformat.class); fileoutputformat.setoutputpath(job, new path("output")); ......

in case eachreducer out put written output/path/filename

Search This Blog

And

hadoop - More than One Reducer and One Output File -

Comments

Post a Comment

Popular posts from this blog

google app engine - 403 Forbidden POST - Flask WTForms -

Android layout hidden on keyboard show -

Parse xml element into list in Python -