hadoop - Mapreduce multiple map and reducer -
i had csv files data follows
lat,lng 18.1234,77.3443 18.345,77.335 18.356,77.345
so contains latitude , longitude , each csv file upto 1mb,i needed calculate distance latitude , longitude first record , second record of csv.
i.e 18.1234, 77.3443 , 18.345, 77.335.
but mapper read 1 line @ time thinking add delimeter('|') between lines,so above csv file records become 1 line , input mapper
key->filename values-> csv records 1 line (all records seprated delimetr) text. filename 18.1234,77.3443|18.345,77.335|18.356,77.345....
in reducer split delimeter , calculate distance between subsequent records[first , second coordinates].
so if have 30 csv files want 30 mappers , 30 reducers process csv files. need store data in mysql. such lat,lng,distance
if each csv file smaller default block size, id of current mapper , emit key.
i believe can id conf.get("mapred.tip.id")
mapper's configuration.
Comments
Post a Comment